The exercises use data from the Chicago Project on Security and Terrorism (CPOST), taking the individual attack as the unit of analysis and coding variables that describe characteristics of those attacks. The dataset is posted as cpost.csv, and abbreviated information on key variables is provided at the end of this problem set. Additional information about the project can be accessed at the cite below, though should not be necessary to answer any of the questions. Chicago Project on Security and Terrorism (CPOST). 2015. Suicide Attack Database (May 28, 2015 Release). [Data File]. Retrieved from http://cpostdata.uchicago.edu Please refer to general problem set instructions for important information on expectations, format, and submission. 1 Exercises 1. Different methods of attack may be better suited to particular targets, depending on whether secrecy, capacity for large payload, surprise, proximity, or some other attribute is needed. Explore the association between the type of weapon used in the attack (weapon) and the its target (target). For this entire question, include observations taking on the other category for weapon type, but exclude observations that are missing data. a. Create a cross-tabs table with target defining the rows and weapon defining the columns, and totals in both directions. b. Consider the observations where a security target was attacked using a belt bomb. i. Identify the number of cases in which this occurred. ii. Determine the total number and percentage of all attacks where security agents were targeted, and what percentage of those came from belt bombs. iii. Determine the total number and percentage of all attacks where a belt bomb was used, and what percentage of those targeted security agents. iv. Calculate approximately how many cases one would expect to see featuring a security target and belt bomb weapon if there were no relationship between the target type and weapon used. c. Estimated a χ 2 test of this relationship. Report: i. the χ 2 statistic ii. the p value iii. whether or not one can reject the null hypothesis at the 0.01 critical threshold d. Briefly interpret your findings statistically and substantively for a general audience.1 1Note–the substantive interpretation should be about general patterns, not specifically the percentages explored in part (b). Those are simply illustrative to get a sense of the comparisons that can be made. 1 2. Although the vast majority of suicide terrorism attacks have been carried out by men, there nonetheless have been several carried out by women. Explore the relationship between the gender of the attacker (gender, where 0=male, 1=female) and the number of people killed (number killed) in the attack. a. Estimate the average number of people killed per attack by each gender. Separately for each, report: i. the mean ii. a 95% confidence interval for the mean b. Estimate a difference of means test using gender as the IV and the number killed as the DV. Report: i. the difference between the means ii. the t-statistic iii. the (two-tailed) p value iv. whether or not one can reject the null hypothesis at the 0.10, 0.05, and 0.01 thresholds c. Generate side-by-side boxplots of the number of people killed in attacks, divided by the gender of the attacker. d. Briefly interpret your findings statistically and substantively for a general audience. 3. Many of the incidents of terrorist attacks in the dataset come from a few specific campaigns (campaign) conducted as part of ongoing conflict in a particular area. Explore the trends over time (year) 2 of how many people were wounded in each attack (number wounded) in those campaigns. a. Identify the two campaigns featuring over 1,000 attacks, and how many attacks each featured. b. Estimate (separate) bivariate linear regression models, restricting the data to just these campaigns, using the number wounded as the dependent variable and time as the explanatory factor. Write down the line of best fit for each, and report your results in a regression table like the one below.3 campaign 1 name campaign 2 name year . . (.) (.) intercept . . (.) (.) N . . *:p<0.10>0.10><0.05>0.05><0.01 c. visualize the relationship by drawing (separate for each campaign, distinguished by color) a scatter plot of observations and lines of best fit. d. briefly interpret your results statistically and substantively for a general audience. 2year is used an iv here for the sake of practicing statistical techniques, but generally best to avoid using simply time as an explanatory factor–it’s not terribly theoretically interesting, plus doesn’t bring policy implications since time marches inexorably on regardless. if you expect a trend over time, consider the factors that might be causing that trend and model those. 3be sure to include the β and α coefficients, their standard errors, a star system to indicate statistical significance of relationship estimates, and the number of observations in the model. see tables used in class for examples. 2 4. consider whether the frequency of attacks being lethal (any killed) has changed over time (year). a. estimate a bivariate logistic regression model using year as the iv and whether there were any fatalities as the dv. report: i. the estimated equation for the curve of best fit ii. the β coefficient along with its standard error, z-score, p-value, and whether it is statistically significant at the 0.05 threshold b. draw a scatterplot of the data, overlaid with a curved line representing the logistic regression estimate. c. estimate a multivariate logistic regression model using both year and gender as ivs. report: i. the β coefficient for year along with its standard error, z score, p value, and whether it is statistically significant at the 0.05 threshold ii. the β coefficient for gender along with its standard error, z score, p value, and whether it is statistically significant at the 0.05 threshold d. using the multivariate model, calculate the predicted probability of a fatality for: i. an attack in 1990 carried out by a male assailant ii. an attack in 1990 carried out by a female assailant iii. an attack in 2015 carried out by a male assailant iv. an attack in 2015 carried out by a female assailant e. briefly interpret your results statistically and substantively for a general audience c.="" visualize="" the="" relationship="" by="" drawing="" (separate="" for="" each="" campaign,="" distinguished="" by="" color)="" a="" scatter="" plot="" of="" observations="" and="" lines="" of="" best="" fit.="" d.="" briefly="" interpret="" your="" results="" statistically="" and="" substantively="" for="" a="" general="" audience.="" 2year="" is="" used="" an="" iv="" here="" for="" the="" sake="" of="" practicing="" statistical="" techniques,="" but="" generally="" best="" to="" avoid="" using="" simply="" time="" as="" an="" explanatory="" factor–it’s="" not="" terribly="" theoretically="" interesting,="" plus="" doesn’t="" bring="" policy="" implications="" since="" time="" marches="" inexorably="" on="" regardless.="" if="" you="" expect="" a="" trend="" over="" time,="" consider="" the="" factors="" that="" might="" be="" causing="" that="" trend="" and="" model="" those.="" 3be="" sure="" to="" include="" the="" β="" and="" α="" coefficients,="" their="" standard="" errors,="" a="" star="" system="" to="" indicate="" statistical="" significance="" of="" relationship="" estimates,="" and="" the="" number="" of="" observations="" in="" the="" model.="" see="" tables="" used="" in="" class="" for="" examples.="" 2="" 4.="" consider="" whether="" the="" frequency="" of="" attacks="" being="" lethal="" (any="" killed)="" has="" changed="" over="" time="" (year).="" a.="" estimate="" a="" bivariate="" logistic="" regression="" model="" using="" year="" as="" the="" iv="" and="" whether="" there="" were="" any="" fatalities="" as="" the="" dv.="" report:="" i.="" the="" estimated="" equation="" for="" the="" curve="" of="" best="" fit="" ii.="" the="" β="" coefficient="" along="" with="" its="" standard="" error,="" z-score,="" p-value,="" and="" whether="" it="" is="" statistically="" significant="" at="" the="" 0.05="" threshold="" b.="" draw="" a="" scatterplot="" of="" the="" data,="" overlaid="" with="" a="" curved="" line="" representing="" the="" logistic="" regression="" estimate.="" c.="" estimate="" a="" multivariate="" logistic="" regression="" model="" using="" both="" year="" and="" gender="" as="" ivs.="" report:="" i.="" the="" β="" coefficient="" for="" year="" along="" with="" its="" standard="" error,="" z="" score,="" p="" value,="" and="" whether="" it="" is="" statistically="" significant="" at="" the="" 0.05="" threshold="" ii.="" the="" β="" coefficient="" for="" gender="" along="" with="" its="" standard="" error,="" z="" score,="" p="" value,="" and="" whether="" it="" is="" statistically="" significant="" at="" the="" 0.05="" threshold="" d.="" using="" the="" multivariate="" model,="" calculate="" the="" predicted="" probability="" of="" a="" fatality="" for:="" i.="" an="" attack="" in="" 1990="" carried="" out="" by="" a="" male="" assailant="" ii.="" an="" attack="" in="" 1990="" carried="" out="" by="" a="" female="" assailant="" iii.="" an="" attack="" in="" 2015="" carried="" out="" by="" a="" male="" assailant="" iv.="" an="" attack="" in="" 2015="" carried="" out="" by="" a="" female="" assailant="" e.="" briefly="" interpret="" your="" results="" statistically="" and="" substantively="" for="" a="" general="">0.01 c. visualize the relationship by drawing (separate for each campaign, distinguished by color) a scatter plot of observations and lines of best fit. d. briefly interpret your results statistically and substantively for a general audience. 2year is used an iv here for the sake of practicing statistical techniques, but generally best to avoid using simply time as an explanatory factor–it’s not terribly theoretically interesting, plus doesn’t bring policy implications since time marches inexorably on regardless. if you expect a trend over time, consider the factors that might be causing that trend and model those. 3be sure to include the β and α coefficients, their standard errors, a star system to indicate statistical significance of relationship estimates, and the number of observations in the model. see tables used in class for examples. 2 4. consider whether the frequency of attacks being lethal (any killed) has changed over time (year). a. estimate a bivariate logistic regression model using year as the iv and whether there were any fatalities as the dv. report: i. the estimated equation for the curve of best fit ii. the β coefficient along with its standard error, z-score, p-value, and whether it is statistically significant at the 0.05 threshold b. draw a scatterplot of the data, overlaid with a curved line representing the logistic regression estimate. c. estimate a multivariate logistic regression model using both year and gender as ivs. report: i. the β coefficient for year along with its standard error, z score, p value, and whether it is statistically significant at the 0.05 threshold ii. the β coefficient for gender along with its standard error, z score, p value, and whether it is statistically significant at the 0.05 threshold d. using the multivariate model, calculate the predicted probability of a fatality for: i. an attack in 1990 carried out by a male assailant ii. an attack in 1990 carried out by a female assailant iii. an attack in 2015 carried out by a male assailant iv. an attack in 2015 carried out by a female assailant e. briefly interpret your results statistically and substantively for a general audience>