see attached fileInstructions: You will be looking at data from a survey in the US state of...

Question

see attached fileInstructions:  You will be looking at data from a survey in the US state of Colorado on opinions of the oil and gas industry, and evaluating whether Facebook ads changed opinions of the oil & gas industry. For context, in this study, some individuals in Colorado were randomly selected to receive video advertisements on Facebook, which highlighted the risks of the oil & gas industry. This is the ‘treatment’ group. Another set of individuals on Facebook were the ‘control’ group and did not receive ads. All individuals in both the treatment and control groups were asked to complete a survey. Not all individuals started the survey, and not all individuals who started the survey completed it. The survey asked respondents a number of demographic questions, then asked “Do you believe your community is better or worse off because of the oil and gas industry?”. Respondents selected one of the following choices: ● 1 - Definitely better off ● 2 - Somewhat better off ● 3 - Neither better nor worse off ● 4 - Somewhat worse off ● 5 - Definitely worse off We can compare answers between the treatment and control groups to evaluate the effectiveness of the advertisements. Data: You will use two datasets: 1. Survey Data: includes a row for every individual who started the survey. Includes fields for survey responses and attributes of individuals. ·  Description of fields:                5-digit FIPS code for county of respondent treatment 1- indicates respondent was in treatment group, 0 - indicates respondent was in control group total_duration_in_sec time respondent to took respond to survey, in seconds Q1_answer_code The respondent's numerical response to survey question 1 Q1_answer_text The respondent's text response to survey question 1 	Field         	Description 	person_id 	ID for survey respondent 	county 	county of respondent 	FIPS 	5-digit FIPS code for county of respondent 	treatment 	1- indicates respondent was in treatment group, 0 - indicates respondent was in control group 	total_duration_in_sec 	time respondent to took respond to survey, in seconds 	Q1_answer_code 	The respondent's numerical response to survey question 1 	Q1_answer_text 	The respondent's text response to survey question 1 2. County Shapefiles: Standard zip file of county boundary shapefiles from the US Census Objectives: With this data, your goal is to: ● Clean up and QA survey data ● Understand scope of cleaned data: what is the geographic coverage of our survey respondents? ● Compare the survey responses of the treatment group (those who saw video advertisements) and control group (those who did not see video advertisements). ASSIGNMENT Part 1: Data Intro and QA In Part 1, we will load the survey data and clean it. 1.1: Set Up Run the code below to import modules. Then read in the survey data into a dataframe called df_survey. The survey data is available on GitHub at the link below: 'https://raw.githubusercontent.com/smsidekick/project-sidekick/main/blihkjhdrsers.csv' # Install Geopandas ! pip install geopandas --q  # Import pandas and numpy import pandas as pd import numpy as np # Import geopandas import geopandas as gpd # Import plotnine from plotnine import * import plotnine  1.2: Explore Data Orient to the survey data. 1.3: Duplicate IDs Is the person_id field unique? Are there any duplicate values in that field? If there are duplicates, remove the duplicates. Save this back to df_survey 1.4: Complete Survey Responses Using code, check if any individuals did not answer survey question 1. If so, filter df_survey to include responses only from those who completed survey question 1: filter out any rows where Q1_answer_code is null. Save this filtered data to a new dataframe called df_complete. 1.5: Survey Speeders Did any respondents in df_complete speed through the survey? Filter out any responses that were impossibly fast outliers based on your judgement. Save this filtered data back to df_complete Make the rationale for your decision clear. A histogram may be helpful. 1.6: Survey Responses Show the distribution of the survey responses in Q1_answer_text (i.e. how many people responded with each answer?) In a sentence, brainstorm why you think some may say the oil & gas industry makes their community better off vs. worse off? Part 2: Survey Coverage in Colorado In Part 2, we will explore the survey results by Colorado county and then create a map to understand the geographic coverage of our responses. We'll explore all the results (for both treatment and control). We're looking to inform two questions: 1. Do we think we have a good, representative sample of the entire state? 2. Do we think have enough data to evaluate the experiment by county? 2.1: Read in County Shapefiles Use command line code to read in the county shapefiles for the entire US from the link below. Read the data into a geodataframe, df_counties https://www2.census.gov/geo/tiger/TIGER2019/COUNTY/tl_2019_us_county.zip 2.2: Filter Geodataframe Filter df_counties to include only Colorado counties by filtering for when STATEFP is 08 (the State FIPS code for Colorado). Save this to a new geodataframe, df_counties_co. 2.3: Summarize Survey by County Turning back to the survey results: create a dataframe summarizing the total number of survey responses by county and FIPS. Save this summary to a new dataframe, df_county_survey. (In the next step, we'll join this onto df_counties_co.) Then, dig into the county results and answer: · How many unique counties do we have in total in df_county_survey? · What is the minimum number of responses in a county? Describe the new dataset, and the distribution of the number of survey responses by county 2.4 Bucket Number of Responses In df_country_survey, create a new column N_resp_bucket that buckets the number of survey responses in steps of 25:

Breeze · Accepted Answer

Part 1: Data Intro and QA
1.1: Set Up
First, we will import the necessary modules and load the survey data into a dataframe called df_survey.
# Install Geopandas 
!pip install geopandas --q 
# Import pandas and numpy 
import pandas as pd 
import numpy as np 
# Import geopandas 
import geopandas as gpd 
# Import plotnine 
from plotnine import * 
import plotnine 
# Load survey data
df_survey = pd.read_csv('https://raw.githubusercontent.com/smsidekick/project-sidekick/main/blihkjhdrsers.csv')
1.2: Explore Data
Now that we have loaded the survey data, let's take a closer look at it to understand its structure and content.
# View the first 5 rows of the data
print(df_survey.head())
# Get summary statistics for the data
print(df_survey.describe())
# Get information on the data types and missing values
print(df_survey.info())
1.3: Duplicate IDs
To determine if the person_id field is unique and if there are any duplicate values in that field, 
we can count the number of unique IDs and compare it to the total number of rows in the dataframe.
# Count the number of unique person IDs
num_unique_ids = df_survey['person_id'].nunique()
print('Number of unique IDs:', num_unique_ids)
# Count the total number of rows in the dataframe
num_rows = df_survey.shape[0]
print('Number of rows:', num_rows)
# Check if there are any duplicate person IDs
if num_unique_ids == num_rows:
    print('There are no duplicate IDs.')
else:
    df_survey.drop_duplicates(subset='person_id', inplace=True)
    print('Duplicate IDs removed.')
1.4: Complete Survey Responses
We can check if any individuals did not answer survey question 1 by looking for null values in the Q1_answer_code field. If there are any null values, 
we can filter the dataframe to include responses only from those who completed survey question 1.
# Check for null values in Q1_answer_code
num_null_values = df_survey['Q1_answer_code'].isnull().sum()
if num_null_values > 0:
    print('There are', num_null_values, 'null values in Q1_answer_code.')
    df_complete = df_survey[df_survey['Q1_answer_code'].notnull()]
    print('Filtered data to include only responses from those who completed survey question 1.')
else:
    df_complete = df_survey.copy()
    print('All responses completed survey question 1.')
1.5: Survey Speeders
To determine if any respondents in df_complete sped through the survey, we can look at the distribution of the total_duration_in_sec field. 
We can use a histogram to visualize this distribution and
determine if there are any outliers that need to be removed.
# Create a histogram of total_duration_in_sec
(ggplot(df_complete,

Sun	Mon	Tue	Wed	Thu	Fri	Sat
23	24	25	26	27	28	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31	1	2	3	4	5

Instructions: You will be looking at data from a survey in the US state of Colorado on opinions of the oil and gas industry, and evaluating whether Facebook ads changed opinions of the oil & gas...

Answer To: Instructions: You will be looking at data from a survey in the US state of Colorado on opinions of...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

Sun	Mon	Tue	Wed	Thu	Fri	Sat
23	24	25	26	27	28	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31	1	2	3	4	5

Sun	Mon	Tue	Wed	Thu	Fri	Sat
23	24	25	26	27	28	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31	1	2	3	4	5

Sun	Mon	Tue	Wed	Thu	Fri	Sat
23	24	25	26	27	28	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31	1	2	3	4	5