Answer To: Today for the first time in history, the largest group of Americans living in poverty are children....
Dinesh answered on Oct 14 2021
doc/bartlet.txt
Today for the first time in history, the largest group of Americans living in poverty are children. One in five children live in the most abject, dangerous, hopeless, back-breaking, gut-wrenching poverty any of us could imagine. One in five, and they’re children. If fidelity to freedom and democracy is the code of our civic religion, then surely the code of our humanity is faithful service to that unwritten commandment that says we shall give our children better than we ourselves received. Let me put it this way: I voted against the bill because I didn’t want to make it harder for people to buy milk. I stopped some money from flowing into your pocket. If that angers you, if you resent me, I completely respect that. But if you expect anything different from the president of the United States, you should vote for someone else.
doc/cosinelab-mc2g0hpn.pdf
DS2500: Lab for week 6
Prof. Rachlin
Fall 2021
Oct 13-15, 2021
Submit on Gradescope everything you’ve completed by 11:59pm eastern on Fri-
day of this week.
Labs are graded on a 0–4 scale. Full credit is given for demonstration of partic-
ipation and effort on the lab assignment.
Lab Problem:
In class this week we learned about cosine similarity as a way of measuring
similarity in text documents. Today, we’re focused on the similarity of fictional
presidents instead of real ones, but working through the lab will still be helpful
as you get deeper into HW3.
The text files for this lab are speeches given by President Bartlet (from the West
Wing) and President Shepard (from the American President), and we’re going
to see how similar they are, using (1) visualization, and (2) cosine similarity.
Both characters are written by Aaron Sorkin, but the speeches are in totally
different contexts, so it’s kind of hard to predict what we’ll find.
You may use the code we did together in class to compute the cosine similarity
of two texts based on the dot product of two vectors.
Here’s the approach today:
• Read in each text file as a string.
• Visualize both Presidents’ speeches using the wordcloud library (also dis-
cussed in Tuesday’s lecture).
• Find the most common words used by each President. You might find it
convenient to use the Counter module to do this. Try creating vectors
using the unique words among the most frequent k words of each speech,
where k=10-20 words (your choice).
1
• Use the cosine similarity function provided in class to measure the
similarity of the two speeches. Is it what you expected? Would you have
guessed that these characters were written by the same person?
• As an optional enhancement, try cleaning up the text to remove small
words and punctuation. This might produce a more accurate result.
2
doc/shepard.txt
Good morning. [Members of the White House Press Corps begin to rise] It's alright. Please keep your seats. Good morning.
For the last couple of months, Senator Rumson has suggested that being President of this country was, to a certain extent, about character. And although I've not been willing to engage in his attacks on me, I have been here three years and three days, and I can tell you without hesitation: Being President of this country is entirely about character.
For the record, yes, I am a card-carrying member of the ACLU, but the more important question is "Why aren't you, Bob?" Now this is an organization whose sole purpose is to defend the Bill of Rights, so it naturally begs the question, why would a senator, his party's most powerful spokesman and a candidate for President, choose to reject upholding the constitution? Now if you can answer that question, folks, then you're smarter than I am, because I didn't understand it until a few hours ago.
America isn't easy. America is advanced citizenship. You've gotta want it bad, 'cause it's gonna put up a fight. It's gonna say, "You want free speech? Let's see you acknowledge a man whose words make your blood boil, who's standing center stage and advocating at the top of his lungs that which you would spend a lifetime opposing at the top of yours." You want to claim this land as the land of the free? Then the symbol of your country cannot just be a flag. The symbol also has to be one of its citizens exercising his right to burn that flag in protest. Now show me that, defend that, celebrate that in your classrooms.
Then you can stand up and sing about the land of the free.
I've known Bob Rumson for years. And I've been operating under the assumption that the reason Bob devotes so much time and energy to shouting at the rain was that he simply didn't get it. Well, I was wrong. Bob's problem isn't that he doesn't get it. Bob's problem is that he can't sell it!
We have serious problems to solve, and we need serious people to solve them. And whatever your particular problem is, I promise you Bob Rumson is not the least bit interested in solving it. He is interested in two things, and two things only: making you afraid of it, and telling you who's to blame for it. That, ladies and gentlemen, is how you win elections. You gather a group of middle age, middle class, middle income voters who remember with longing an easier time, and you talk to them about family, and American values and character, and you wave an old photo of the President's girlfriend and you scream about patriotism. You tell them she's to blame for their lot in life. And you go on television and you call her a whore.
Sydney Ellen Wade has done nothing to you, Bob. She has done nothing but put herself through school, represent the interests of public school teachers, and lobby for the safety of our natural resources. You want a character debate, Bob? You better stick with me, 'cause Sydney Ellen Wade is way out of your league.
I've loved two women in my life. I lost one to cancer. And I lost the other 'cause I was so busy keeping my job, I forgot to do my job. Well, that ends right now.
Tomorrow morning the White House is sending a bill to Congress for its consideration. It's White House Resolution 455, an energy bill requiring a twenty percent reduction of the emission of fossil fuels over the next ten years. It is by far the most aggressive stride ever taken in the fight to reverse the effects of global warming. The other piece of legislation is the crime bill. As of today, it no longer exists. I'm throwing it out. I'm throwing it out and writing a law that makes sense. You cannot address crime prevention without getting rid of assault weapons and hand guns. I consider them a threat to national security, and I will go door to door if I have to, but I'm gonna convince Americans that I'm right, and I'm gonna get the guns.
We've got serious problems, and we need serious people. And if you want to talk about character, Bob, you'd better come at me with more than a burning flag and a membership card. If you want to talk about character and American values, fine. Just tell me where and when, and I'll show up. This a time for serious people, Bob, and your fifteen minutes are up.
text_similarity.ipynb
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "cosine-similarity.ipynb",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "0ng6HbOwL9JM"
},
"source": [
"# Lab Problem:\n",
"\n",
"In class this week we learned about cosine similarity as a way of measuring\n",
"similarity in text documents. Today, we’re focused on the similarity of fictional presidents instead of real ones, but working through the lab will still be helpful as you get deeper into HW3.\n",
"\n",
"\n",
"The text files for this lab are speeches given by President Bartlet (from the West Wing) and President Shepard (from the American President), and we’re going\n",
"to see how similar they are,\n",
"using (1) visualization, and (2) cosine similarity.\n",
"\n",
"Both characters are written by Aaron Sorkin, but the speeches are in totally\n",
"different contexts, \n",
"\n",
"so it’s kind of hard to predict what we’ll find.\n",
"You may use the code we did together in class to compute the cosine similarity\n",
"of two texts based on the dot product of two vectors.\n",
"\n",
"Here’s the approach today:\n",
"\n",
"• Read in each text file as a string.\n",
"\n",
"• Visualize both Presidents’ speeches using the wordcloud library (also discussed in Tuesday’s lecture).\n",
"\n",
"• Find the most common words used by each President. You might find it\n",
"convenient to use the Counter module to do this. Try creating vectors\n",
"using the unique words among the most frequent k words of each speech,\n",
"where k=10-20 words (your choice).\n",
"\n",
"\n",
"• Use the cosine similarity function provided in class to measure the\n",
"similarity of the two speeches. Is it what you expected? Would you have\n",
"guessed that these characters were written by the same person?\n",
"\n",
"• As an optional enhancement, try cleaning up the text to remove small\n",
"words and punctuation. This might produce a more accurate result."
]
},
{
"cell_type": "code",
"metadata": {
"id": "VzRJIF5qNIHU"
},
"source": [
"# Install python packages\n",
"!pip install matplotlib pandas wordcloud"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "NH6PwCgLyai4"
},
"source": [
"# Word cloud"
]
},
{
"cell_type": "code",
"metadata": {
"id": "LJtgn3sDM2FF"
},
"source": [
"# import python packages\n",
"from wordcloud import WordCloud, STOPWORDS\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import math\n",
"import re\n",
"from collections import Counter"
],
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "swuIytiYLsK7"
},
"source": [
"# Read txt file and store it as string\n",
"\n",
"with open('bartlet.txt', 'r') as file:\n",
" bartlet = file.read().replace('\\n', '')\n",
"\n",
"with open('shepard.txt', 'r') as file:\n",
" shepard = file.read().replace('\\n', '')"
],
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "zVCByahlN-ZZ"
},
"source": [
"# Generate word cloud for bartlet\n",
"\n",
"bartlet_word_cloud = WordCloud(width= 3000, height = 2000,\n",
" random_state=1,\n",
" colormap='Pastel1',\n",
" collocations=False, \n",
" stopwords = STOPWORDS).generate(bartlet)"
],
"execution_count": 4,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 592
},
"id": "o-3_jTUFOmmu",
"outputId": "2975a8d5-23e5-4fdb-b7f9-524108762593"
},
"source": [
"plt.figure(figsize=(20, 10))\n",
"plt.axis('off')\n",
"plt.savefig(\"bartlet.png\")\n",
"plt.imshow(bartlet_word_cloud)"
],
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {},
"execution_count": 5
},
{
"output_type": "display_data",
"data": {
"image/png":...