Complete the Hypothesis Case Study Part 1 tutorial. It is not a complete case study; it is just the steps you might take to do Graph Analysis. I have provided sample code for you to use as you go...

1 answer below »

View more »
Answered Same DaySep 29, 2021

Answer To: Complete the Hypothesis Case Study Part 1 tutorial. It is not a complete case study; it is just the...

Kshitij answered on Sep 29 2021
136 Votes
45265.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"CINDY HERRERA DSC550 WEEK 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Applied Text Analysis With Python Exercises"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import string\n",
"import re\n",
"import matplotlib.pyplot as plt\n",
"from collections import Counter"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"#Step 1: Load data into a dataframe"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"addr1 = \"articles1.csv\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2: check the dimension of the table/look at the data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The dimension of the table is: (50000, 10)\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
Unnamed: 0idtitlepublicationauthordateyearmonthurlcontent
0017283House Republicans Fret About Winning Their Hea...New York TimesCarl Hulse2016-12-312016.012.0NaNWASHINGTON — Congressional Republicans have...
1117284Rift Between Officers and Residents as Killing...New York TimesBenjamin Mueller and Al Baker2017-06-192017.06.0NaNAfter the bullet shells get counted, the blood...
2217285Tyrus Wong, ‘Bambi’ Artist Thwarted by Racial ...New York TimesMargalit Fox2017-01-062017.01.0NaNWhen Walt Disney’s “Bambi” opened in 1942, cri...
3317286Among Deaths in 2016, a Heavy Toll in Pop Musi...New York TimesWilliam McDonald2017-04-102017.04.0NaNDeath may be the great equalizer, but it isn’t...
4417287Kim Jong-un Says North Korea Is Preparing to T...New York TimesChoe Sang-Hun2017-01-022017.01.0NaNSEOUL, South Korea — North Korea’s leader, ...
\n",
"
"
],
"text/plain": [
" Unnamed: 0 id title \\\n",
"0 0 17283 House Republicans Fret About Winning Their Hea... \n",
"1 1 17284 Rift Between Officers and Residents as Killing... \n",
"2 2 17285 Tyrus Wong, ‘Bambi’ Artist Thwarted by Racial ... \n",
"3 3 17286 Among Deaths in 2016, a Heavy Toll in Pop Musi... \n",
"4 4 17287 Kim Jong-un Says North Korea Is Preparing to T... \n",
"\n",
" publication author date year month \\\n",
"0 New York Times Carl Hulse 2016-12-31 2016.0 12.0 \n",
"1 New York Times Benjamin Mueller and Al Baker 2017-06-19 2017.0 6.0 \n",
"2 New York Times Margalit Fox 2017-01-06 2017.0 1.0 \n",
"3 New York Times William McDonald 2017-04-10 2017.0 4.0 \n",
"4 New York Times Choe Sang-Hun 2017-01-02 2017.0 1.0 \n",
"\n",
" url content \n",
"0 NaN WASHINGTON — Congressional Republicans have... \n",
"1 NaN After the bullet shells get counted, the blood... \n",
"2 NaN When Walt Disney’s “Bambi” opened in 1942, cri... \n",
"3 NaN Death may be the great equalizer, but it isn’t... \n",
"4 NaN SEOUL, South Korea — North Korea’s leader, ... "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"articles = pd.read_csv(addr1)\n",
"\n",
"\n",
"print(\"The dimension of the table is: \", articles.shape)\n",
"\n",
"# here we displayed the top 5 rows of the dataframe we created , \n",
"# so that we can have a idea of what type of things are there in the dataframe\n",
"articles.head(5)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3: what type of variables are in the table "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Describe Data\n",
" Unnamed: 0 id year month url\n",
"count 50000.000000 50000.000000 50000.000000 50000.000000 0.0\n",
"mean 25694.378380 44432.454800 2016.273700 5.508940 NaN\n",
"std 15350.143677 15773.615179 0.634694 3.333062 NaN\n",
"min 0.000000 17283.000000 2011.000000 1.000000 NaN\n",
"25% 12500.750000 31236.750000 2016.000000 3.000000 NaN\n",
"50% 25004.500000 43757.500000 2016.000000 5.000000 NaN\n",
"75% 38630.250000 57479.250000 2017.000000 8.000000 NaN\n",
"max 53291.000000 73469.000000 2017.000000 12.000000 NaN\n",
"Summarized Data\n",
" title publication \\\n",
"count 50000 50000 \n",
"unique 49920 5 \n",
"top The 10 most important things in the world righ... Breitbart \n",
"freq 7 23781 \n",
"\n",
" author date content \n",
"count 43694 50000 50000 \n",
"unique 3603 983 49888 \n",
"top Breitbart News 2016-08-22 advertisement \n",
"freq 1559 221 42 \n"
]
},
{
"data": {
"text/plain": [
"Unnamed: 0 int64\n",
"id int64\n",
"title object\n",
"publication object\n",
"author object\n",
"date object\n",
"year float64\n",
"month float64\n",
"url float64\n",
"content object\n",
"dtype: object"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# now we are required to get the type of variables in the table , which is doen as follows\n",
"print(\"Describe Data\")\n",
"print(articles.describe())\n",
"print(\"Summarized Data\")\n",
"print(articles.describe(include=['O']))\n",
"\n",
"# this will return the datatypes of the columns\n",
"articles.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"50000\n"
]
}
],
"source": [
"#display length of data\n",
"print(len(articles))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
April
January
February
March
April
May
June
July
August
September
October
November
December
2025
2025
2026
2027
SunMonTueWedThuFriSat
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30