see the filesHomework 3: Election Prediction Files to turn in: election.py, answers.txt. In this...

Question

see the filesHomework 3: Election Prediction  Files to turn in: election.py, answers.txt. In this assignment, you will practice use of data structures such as lists, dictionaries, and sets.  Contents:  • Introduction and background  o Election prediction: pundits vs. statisticians  o Election polls: Which ones to trust?  o The US Electoral College  o Why polls are approximate  • Assignment Overview  o Problem 0: Obtain the files, add your name  o Problem 1: State edges  o Testing your implementation  o Problem 2: Find the most recent poll row  o Problem 3: Pollster predictions  o Problem 4: Pollster errors  o Problem 5: Pivot a nested dictionary  o Problem 6: Average the edges in a single state  o Problem 7: Predict the 2012 election  o Reflection and submitting your work  • Appendix: Data type reference Introduction and background  Election prediction: pundits vs. statisticians  In the past, the outcome of political campaigns was predicted by political analysts and pundits, using a  combination of their experience, intuition, and personal biases and preferences. In recent decades there  has been a shift to a more scientific approach, in which election results are predicted statistically using a  poll. A small random sample of voters is asked how they will vote, and from that the result of the entire  election is extrapolated.  The 2012 presidential election was a watershed in the fight between pundits and statisticians. The  rivalry became front-page news, with many pundits loudly proclaiming that the statisticians would be  humiliated on November 6. In fact, the opposite happened: statistician Nate Silver (of the  website FiveThirtyEight) correctly predicted the outcome in every state, whereas pundits' predictions  varied significantly. Literally dozens of prominent political analysts had predicted a Romney win. Other  pundits said the election was “too close to call”, though Silver and other statisticians had been  predicting an Obama win for months. These results changed the way many Americans view political  commentators, revealing them as entertainers but not as reliable sources of information.  How did Nate Silver do it? In this assignment, you will find out, and you will replicate his results by using  polling data to predict the outcome of the 2012 US presidential election.  Election polls: Which ones to trust?  An election poll is a survey that asks a small sample of voters how they plan to vote. If the sample of  voters is representative of the voting population at large, then the poll predicts the result of the entire  election. In practice, a poll's prediction must be taken with a grain of salt, because the sample is  only approximately representative of the voting population. (See below for an explanation of why.)  For example, in late October 2012, the Gallup poll consistently gave Romney a 6-percentage-point lead  in the popular vote, but in fact Obama won the popular vote by 2.6 percentage points. On the other  hand, RAND Corporation was biased toward the Democrats and tended to overstate Obama's lead by  1.5 percentage points. How can you decide which polls to rely upon? Depending on which poll you trust,  you might make a very different prediction.  One approach is to average together the different polls. This is better than trusting any one of them, but  it is still rather crude. What if most of them are biased? That was the case for the 23 organizations that  conducted at least 5 polls in the last 21 days of the 2012 Presidential campaign: 19 of the 23  organizations gave a result that favored Republicans more than the actual results did. Nonetheless, Nate  Silver's prediction was very close to correct, and showed no bias toward either party.  Silver's approach is very sophisticated, but its key idea is to combine different polls using a weighted  average. In a normal average, each data point contributes equally to the result. In a weighted average,  some data points contribute more than others. Silver examined how well each polling organization had  predicted previous elections, and then weighted their polls according to their accuracy: more biased  pollsters had less effect on the weighted average.  The general structure of FiveThirtyEight's algorithm is:  http://www.fivethirtyeight.com/ 1. Calculate the average error of each pollster's predictions for previous elections. This is known as  the pollster's rank. A smaller rank indicates a more accurate pollster.  2. Transform each rank into a weight (for use in a weighted average). A larger weight indicates a  more accurate pollster. FiveThirtyEight considers a number of factors when computing a weight,  including rank, sample size, and when a poll was conducted. For this assignment, we simply  set weight to equal the inverse square of rank (weight = rank**(-2)).  3. In each state, perform a weighted average of predictions made by pollsters. This predicts the  winner in that state.  4. Calculate the outcome of the Electoral College, using the per-state predictions. The candidate  with the most electoral votes wins the election.  The algorithm is described in more detail at the FiveThirtyEight blog. You do not have to read or  understand this information to complete this assignment, but you may find it interesting nonetheless.  The US Electoral College  We have given you an implementation of the electoral_college_outcome function, so this section is for  your information but you do not need it while writing code for your assignment.  Here is information about US Presidential elections and the US Electoral College, paraphrased from  Wikipedia:  The President of the United States is not elected directly by the voters. Instead, the President is  elected indirectly by “electors” who are selected by popular vote on a state-by-state basis. Each  state has as many electors as members of Congress. There are 538 electors, based on Congress  having 435 representatives and 100 senators, plus three electors from the District of Columbia.  Electors are selected on a “winner-take-all” basis. That is, all electoral votes go to the presidential  candidate who wins the most votes in the state. (Actually, Maine and Nebraska use a slightly  different method, but for simplicity in this assignment, we will assume they use the “winner-take- all” approach.)  Our analysis only considers the Democratic and Republican political parties. This is a reasonable  simplification, since a third-party candidate has received an electoral vote only once in the past 60 years  (in 1968, George Wallace won 8% of the electoral vote).  Why polls are approximate  This section of the handout explains why poll results are only approximate, and how poll aggregation  helps.  Recall that a poll sample is only approximately representative of the voting population. There are two  reasons for this: sampling error and pollster bias.  1. Sampling error: If you randomly choose a sample from a population, then random chance may  cause the sample to differ from the population. The US population is 50.7% female and 49.3%  male, but a random sample of 1000 individuals might include 514 females and 486 males or 496  females and 504 males. An extrapolation from the sample to the entire population would be  http://en.wikipedia.org/wiki/Weighted_mean http://fivethirtyeight.blogs.nytimes.com/methodology/ http://en.wikipedia.org/wiki/Electoral_College_(United_States) slightly incorrect. The larger the sample, the more likely it is to be representative of the  population.  Sampling error is unavoidable, but it can be reduced by increasing the sample size. This is one  reason that poll aggregation can be successful: it effectively uses a larger sample than any one  individual poll.  2. Pollster bias or “house effects”: These are systematic inaccuracies caused by faulty  methodology — essentially, the pollster has not chosen a random sample of US voters. Suppose  that a pollster sampled only Mormons or only African-Americans; it would be meaningless to  predict the overall vote from these biased samples. Actual pollster bias comes in subtler forms,  and can be a positive or a negative factor. Here are some examples:  o Not all Americans vote, so each polling firm should adjust its sampling to select not  among all Americans, but among likely voters. Poor people and young people are are  less likely to vote, so a polling firm might adjust its statistics to account for that, but the  firm might over- or undercompensate.  o Survey response rates are typically lowest in urban areas, so unweighted samples  routinely under-represent black and Hispanic Americans who frequently live in urban  areas.  o Some telephone polls call only landline numbers, but 1/3 of Americans rely on  cellphones — and they are younger, more urban, poorer, and more likely to be black  and Hispanic, all of which correlate with Democratic voting.  o Question wording and order has a significant effect on responses.  Pollster bias is avoidable by improving methodology — or, if you can determine a pollster's bias,  you can adjust their scores accordingly and use the adjusted scores rather than what the pollster  reports. That is what Nate Silver and other “poll aggregators” did — even without knowing the  specific sources of bias. Assignment Overview  In this assignment, you will write a Python program that predicts the outcome of the 2012 US  Presidential election, based on polling data and results from the 2012 and 2008 elections. The Professor  designed the overall program, including deciding the names and specifications of all the functions and  implementing some of the functions. Your job is to implement the rest of the functions. You will verify  your implementation using the testing code that we provide. Along the way, you will learn about Python  collections.  Don't panic! This assignment might look long, but we have already done most of the work for you. You  only have to implement 10 functions — and the Professor has already written the documentation and  tests for those functions, so you know exactly what to do and you know whether your solution is  correct. The implementation of those 10 function bodies consists of only 63 lines of code in total, and 8  of the 10 functions have a body consisting of 6 or fewer lines of code. Your solution might be smaller or  larger, and that is fine; we mention the size only to give you a feel for the approximate amount of code  you have to write. While solving this assignment, you should expect to spend more  time thinking than programming.  Hint: Before you implement any function, try describing the algorithm and hand-simulating it on some  sample data.   Problem

Pulkit · Accepted Answer

Name: ...
ID: ...
Homework 3: Election prediction
#read_csv( "./data/2008-results.csv")

rows2 = [{'State': 'WA', 'Dem': '1.0', 'Rep': '0.1'}]#,
        #{'State': 'CA', 'Dem': '0.2', 'Rep': '1.3'}

#assert state_edges(rows2) == {'WA': 0.9, 'CA': -1.1}

state_edges(rows2)
#print rows2

"
'
poll_rows1 = [{"ID":1, "State":"WA", "Pollster":"A", "Date":"Jan 07 2010"},
              {"ID":2, "State":"WA", "Pollster":"B", "Date":"Mar 21 2010"},
              {"ID":3, "State":"WA", "Pollster":"A", "Date":"Jan 08 2010"},
              {"ID":4, "State":"OR", "Pollster":"A", "Date":"Feb 10 2010"},
              {"ID":5, "State":"WA", "Pollster":"B", "Date":"Feb 10 2010"},
              {"ID":6, "State":"WA", "Pollster":"B", "Date":"Mar 22 2010"}]

print most_recent_poll_row(poll_rows1, "A", "SS") 
print most_recent_poll_row(poll_rows1, "A",

Homework 3: Election Prediction Files to turn in: election.py, answers.txt. In this assignment, you will practice use of data structures such as lists, dictionaries, and sets. Contents: • Introduction...

Answer To: Homework 3: Election Prediction Files to turn in: election.py, answers.txt. In this assignment, you...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment