Instructions: You will be looking at data from a survey in the US state of Colorado on opinions of the oil and gas industry, and evaluating whether Facebook ads changed opinions of the oil & gas...

1 answer below »

View more »
Answered 3 days AfterMar 24, 2023

Answer To: Instructions: You will be looking at data from a survey in the US state of Colorado on opinions of...

Breeze answered on Mar 26 2023
49 Votes
Part 1: Data Intro and QA
1.1: Set Up
First, we will import the necessary modules and load the survey data into a dataframe called df_survey.
# Install Geopandas
!pip install geopandas --q
# Import pandas and numpy
import pandas as pd
import numpy as np
# Impo
rt geopandas
import geopandas as gpd
# Import plotnine
from plotnine import *
import plotnine
# Load survey data
df_survey = pd.read_csv('https://raw.githubusercontent.com/smsidekick/project-sidekick/main/blihkjhdrsers.csv')
1.2: Explore Data
Now that we have loaded the survey data, let's take a closer look at it to understand its structure and content.
# View the first 5 rows of the data
print(df_survey.head())
# Get summary statistics for the data
print(df_survey.describe())
# Get information on the data types and missing values
print(df_survey.info())
1.3: Duplicate IDs
To determine if the person_id field is unique and if there are any duplicate values in that field,
we can count the number of unique IDs and compare it to the total number of rows in the dataframe.
# Count the number of unique person IDs
num_unique_ids = df_survey['person_id'].nunique()
print('Number of unique IDs:', num_unique_ids)
# Count the total number of rows in the dataframe
num_rows = df_survey.shape[0]
print('Number of rows:', num_rows)
# Check if there are any duplicate person IDs
if num_unique_ids == num_rows:
print('There are no duplicate IDs.')
else:
df_survey.drop_duplicates(subset='person_id', inplace=True)
print('Duplicate IDs removed.')
1.4: Complete Survey Responses
We can check if any individuals did not answer survey question 1 by looking for null values in the Q1_answer_code field. If there are any null values,
we can filter the dataframe to include responses only from those who completed survey question 1.
# Check for null values in Q1_answer_code
num_null_values = df_survey['Q1_answer_code'].isnull().sum()
if num_null_values > 0:
print('There are', num_null_values, 'null values in Q1_answer_code.')
df_complete = df_survey[df_survey['Q1_answer_code'].notnull()]
print('Filtered data to include only responses from those who completed survey question 1.')
else:
df_complete = df_survey.copy()
print('All responses completed survey question 1.')
1.5: Survey Speeders
To determine if any respondents in df_complete sped through the survey, we can look at the distribution of the total_duration_in_sec field.
We can use a histogram to visualize this distribution and
determine if there are any outliers that need to be removed.
# Create a histogram of total_duration_in_sec
(ggplot(df_complete, aes(x='total_duration_in_sec'))
+ geom_histogram(bins=50)
+...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here
March
January
February
March
April
May
June
July
August
September
October
November
December
2025
2025
2026
2027
SunMonTueWedThuFriSat
23
24
25
26
27
28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
3
4
5
00:00
00:30
01:00
01:30
02:00
02:30
03:00
03:30
04:00
04:30
05:00
05:30
06:00
06:30
07:00
07:30
08:00
08:30
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
19:30
20:00
20:30
21:00
21:30
22:00
22:30
23:00
23:30