This assignment will allow you to practice conducting statistical analyses as a researching psychologist. As you recall from class, prior to conducting a study you need to ask a question that interests you. I provide an example below, but you are instructed to create your own versions of interesting questions to ask and to test using the supplied data. You have been provided unique data sets so your statistics and answers must be done on the data set that is assigned to you (i.e., the filename will have your student number). A unique answer key will be made for each data set and available to the TA to help them mark the assignments. If your answers do not match the answer key, you will not receive a full mark. As discussed in class, hypothesis testing involves 4 steps. 1. Ask a question about the population and state the hypothesis 2. Use hypothesis to predict sample characteristics 3. Obtain sample, collect data, and perform statistical analyses 4. Compare result to prediction and make a decision
PSY2116 Assignment 1 Handing in the assignment Please submit your assignments on BrightSpace before the deadline. 2 Instructions This assignment will allow you to practice conducting statistical analyses as a researching psychologist. As you recall from class, prior to conducting a study you need to ask a question that interests you. I provide an example below, but you are instructed to create your own versions of interesting questions to ask and to test using the supplied data. You have been provided unique data sets so your statistics and answers must be done on the data set that is assigned to you (i.e., the filename will have your student number). A unique answer key will be made for each data set and available to the TA to help them mark the assignments. If your answers do not match the answer key, you will not receive a full mark. As discussed in class, hypothesis testing involves 4 steps. 1. Ask a question about the population and state the hypothesis 2. Use hypothesis to predict sample characteristics 3. Obtain sample, collect data, and perform statistical analyses 4. Compare result to prediction and make a decision 3 Statistical tests you need to conduct You are required to come up with interesting questions/hypotheses for each of the tests below and perform the following statistical tests to address the questions. Please make sure that you are creating a unique question for each question. Get creative and make this activity fun for yourself. You can come up with silly and even unrealistic scenarios. 1 4 Questions 1. Perform a 1 sample t-test, 2-tailed, α = 0.05, and summarize your results. Make sure to use V1 column in your dataset to answer this ques- tion. For this test, pick a meaningful population mean to which you will be comparing your scores. (5 points) 2. Perform a repeated samples t-test, 2-tailed, α = 0.05, and summarize your results; Calculate Cohen’s d and report it in the conclusion. Make sure to use V1 and V2 columns in your dataset to answer this question. (10 points) 3. Perform a 2-sample independent samples t-test, 2-tailed, α = 0.05, and summarize your results; Calculate the 95% Confidence Intervals around the mean difference and report them in the conclusion. Make sure to use V1 and V2 columns in your dataset to answer this question. (10 points) 4. Perform a 1-way ANOVA, α = 0.05, on the 3 variables and summarize your results; show results for pairwise comparisons (if they are needed). Make sure to use V1, V2 and V3 columns in your dataset to answer this question. (10 points) For each question, show plot(s) of your data and write your conclusion/summary in proper APA format. For examples of proper APA format, please refer to the lectures. If you are asked to report Cohen’s d or CIs, you may need to calculate these manually (or in R or any other statistical package). 2 5 Example – 2-sample independent t-test, 2-tailed 5.1 Question/hypothesis We would like to determine whether the resting heart rate (RHR) of students in PSY2116 is normal compared to the general population of University of Ottawa students. The reason that we are asking this question is because we want to know whether Dr. Konar is causing PSY2116 students to have heart condi- tions that may result in a different RHR compared to the population of all the University of Ottawa students. To conduct the study and to test our hypothesis, we take a sample of 20 students from each population, n1 = 20 from PSY2116 and n2 = 20 from stopping students on campus at various locations and at different times to get a good representative random sample. We measure the RHR of each participant from both samples and record it (this is equivalent to the data stored under ‘V1’ for sample 1 and ‘V2’ for sample 2). Having measured the RHRs for all participants, we want to test our hypoth- esis that PSY2116 students have a different mean RHR compared to the general population because we suspect that Dr. Konar is causing students stress, which may present as a different RHR. When writing up my results, it helps to organize my thoughts using the sequence of steps we learned when conducting a hypothesis test. 5.2 Hypothesis test Hence, my first step is to state my null and alternative hypotheses. The null hypothesis in this study is that Dr. Konar is not causing heart issues in his students and that the average RHR of PSY2116 students is not different from the general population of students at the University of Ottawa. The alternative hypothesis is that Dr. Konar is causing heart issues in PSY2116 students and thus their RHRs are different from the mean of the general population. Given that we are conducting a 2-tailed test, note the way I structured the formulaic version of the hypotheses: • H0: µ1 − µ2 = 0 or µ1 = µ2 • H1: µ1 6= µ2 You have the option of stating your hypotheses in words, as in the previous paragraph, or formulaically, as is seen right above this sentence. For the 2nd step, I want to figure out the criteria for accepting or reject- ing my null hypothesis. For this, I will determine my tcritical using a table. Alternatively, I can rely on R’s built-in function that I will show below. For the 3rd step, we collect data (this was done for us already) and run the appropriate statistical analyses. Here, we calculate whether there is a significant difference between our sample data and the population, whether this difference 3 is meaningful (e.g., Cohen’s d), and use an interval estimate of the population mean (e.g., Confidence Intervals) instead of a point estimate (i.e., sample mean). For the 4th and final step, we make a decision after comparing tobserved to tcritical, and write our results in an APA format. Further, we report our effect size to supplement our t-test in order to determine whether the difference (if significant), was meaningful. Further, using Confidence Intervals, we can state how likely our mean is in relation to the population mean. 6 Your data set Once you download the file ‘PSY2116-ClassData.zip’ from BrightSpace, you have to unzip it. • On Windows, note the directory where you are saving the file. In that directory, you can right-click on the file and choose Extract All; then pick a folder where you want the data to be placed. Pressing ‘Enter’ will just unzip in the same folder. • On OSX, note the directory where you are saving the file. If you double- click on the ‘PSY2116-ClassData.zip’ file, it should extract in the same directory in the Finder. • I haven’t done this on Linux in a long time, so if you cannot figure this out, please come see me and we’ll sort it out. I’m guessing if you are using Linux, this step is a non-issue. Find your student number among the files. This PSY2116 ∗.CSV file is your data set (your student # is in place of the *, obviously). Once you open your data file, it will include 3 columns of randomly generated numbers. The column names are ‘V1’, ‘V2’, and ‘V3’. There are 20 rows of data in each column. If you are asked to conduct an analysis on 1 sample, then use column 1 titled ‘V1’. If you are asked to conduct an analysis on 2 samples, then use columns 1 and 2 titled ‘V1’ and ‘V2’. Finally, when you are asked to do a 1-way ANOVA, you will use all 3 columns for this analysis. The ‘PSY2116-ClassData.zip’ file also includes the file ‘PSY2116 1234567.csv’. I will be using the data from this file to demonstrate analyses in R. You can mimic how I do the analyses on this file and cater it to your data. Do NOT use the data from ‘PSY2116 1234567.csv’ as your own data. This file is for demonstrative purposes only. 7 Getting data into R To import your data set into R, you have to follow a few steps that I will outline below. Assume that I saved the data file (‘PSY2116 1234567.csv’) on my Desktop. To import it into RStudio, open RStudio first. Then choose File, New File, 4 New R Script. All your work will be entered and stored here. Make sure to save this R script as something you’ll recognize, e.g., ‘PSY2116 Assignment.R’. Now enter the following commands into RStudio. Remember, to execute each line of code in RStudio. Windows: use Ctrl-Enter on the line of code that you would like to execute; OSX: use Command-Enter. Linux: hopefully you can figure this out; if not, talk to me and we’ll figure it out together. # Create a variable 'file.location' where you will specify the file's # location: # on Windows, uncomment the following line (but make sure # to comment out the next line, which only works on OSX) # # file.location = 'c:/Users/yaro/Desktop/PSY2116_1234567.csv' file.location = '~/Desktop/PSY2116_1234567DEMO.csv' # <- on osx # please note that the user 'yaro' is specific to my computer only. # use your own user id that you created in windows in place of 'yaro' # now import the data from your *.csv file into r. # create an object 'mydata' where the data will be stored: # this function is telling rstudio that the file has a header and # data are separated by commas; all data will be stored in 'mydata' mydata = read.table(file.location, header = true, sep = ',') # view the data: mydata ## v1 v2 v3 ## 1 41 25 20 ## 2 43 28 31 ## 3 57 19 38 ## 4 43 31 38 ## 5 51 27 35 ## 6 33 27 34 ## 7 62 28 30 ## 8 52 31 23 ## 9 55 28 35 ## 10 47 29 40 ## 11 39 28 27 ## 12 43 18 31 ## 13 50 31 21 ## 14 43 34 24 ## 15 55 28 31 ## 16 53 31 29 5 ## 17 46 38 30 ## 18 45 28 30 ## 19 51 29 21 ## 20 51 26 28 # check your data, it should be a data.frame: class(mydata) ## [1] "data.frame" now spend some time familiarizing yourself with the data. you can do quick descriptive stats and some plots to visualize the data. # calculate mean of v1, the first column of data: mean(mydata$v1) ## [1] 48 # same for v2: mean(mydata$v2) ## [1] 28.2 # or you can use a one-liner with(mydata, mean(v1)); with(mydata, mean(v2)) ## [1] 48 ## [1] 28.2 # calculate the variance, standard deviation, etc with(mydata, var(v1)); with(mydata, var(v2)) ## [1] 48.21053 ## [1] 19 on="" osx="" #="" please="" note="" that="" the="" user="" 'yaro'="" is="" specific="" to="" my="" computer="" only.="" #="" use="" your="" own="" user="" id="" that="" you="" created="" in="" windows="" in="" place="" of="" 'yaro'="" #="" now="" import="" the="" data="" from="" your="" *.csv="" file="" into="" r.="" #="" create="" an="" object="" 'mydata'="" where="" the="" data="" will="" be="" stored:="" #="" this="" function="" is="" telling="" rstudio="" that="" the="" file="" has="" a="" header="" and="" #="" data="" are="" separated="" by="" commas;="" all="" data="" will="" be="" stored="" in="" 'mydata'="" mydata="read.table(file.location," header="TRUE," sep=',' )="" #="" view="" the="" data:="" mydata="" ##="" v1="" v2="" v3="" ##="" 1="" 41="" 25="" 20="" ##="" 2="" 43="" 28="" 31="" ##="" 3="" 57="" 19="" 38="" ##="" 4="" 43="" 31="" 38="" ##="" 5="" 51="" 27="" 35="" ##="" 6="" 33="" 27="" 34="" ##="" 7="" 62="" 28="" 30="" ##="" 8="" 52="" 31="" 23="" ##="" 9="" 55="" 28="" 35="" ##="" 10="" 47="" 29="" 40="" ##="" 11="" 39="" 28="" 27="" ##="" 12="" 43="" 18="" 31="" ##="" 13="" 50="" 31="" 21="" ##="" 14="" 43="" 34="" 24="" ##="" 15="" 55="" 28="" 31="" ##="" 16="" 53="" 31="" 29="" 5="" ##="" 17="" 46="" 38="" 30="" ##="" 18="" 45="" 28="" 30="" ##="" 19="" 51="" 29="" 21="" ##="" 20="" 51="" 26="" 28="" #="" check="" your="" data,="" it="" should="" be="" a="" data.frame:="" class(mydata)="" ##="" [1]="" "data.frame"="" now="" spend="" some="" time="" familiarizing="" yourself="" with="" the="" data.="" you="" can="" do="" quick="" descriptive="" stats="" and="" some="" plots="" to="" visualize="" the="" data.="" #="" calculate="" mean="" of="" v1,="" the="" first="" column="" of="" data:="" mean(mydata$v1)="" ##="" [1]="" 48="" #="" same="" for="" v2:="" mean(mydata$v2)="" ##="" [1]="" 28.2="" #="" or="" you="" can="" use="" a="" one-liner="" with(mydata,="" mean(v1));="" with(mydata,="" mean(v2))="" ##="" [1]="" 48="" ##="" [1]="" 28.2="" #="" calculate="" the="" variance,="" standard="" deviation,="" etc="" with(mydata,="" var(v1));="" with(mydata,="" var(v2))="" ##="" [1]="" 48.21053="" ##="" [1]="">- on osx # please note that the user 'yaro' is specific to my computer only. # use your own user id that you created in windows in place of 'yaro' # now import the data from your *.csv file into r. # create an object 'mydata' where the data will be stored: # this function is telling rstudio that the file has a header and # data are separated by commas; all data will be stored in 'mydata' mydata = read.table(file.location, header = true, sep = ',') # view the data: mydata ## v1 v2 v3 ## 1 41 25 20 ## 2 43 28 31 ## 3 57 19 38 ## 4 43 31 38 ## 5 51 27 35 ## 6 33 27 34 ## 7 62 28 30 ## 8 52 31 23 ## 9 55 28 35 ## 10 47 29 40 ## 11 39 28 27 ## 12 43 18 31 ## 13 50 31 21 ## 14 43 34 24 ## 15 55 28 31 ## 16 53 31 29 5 ## 17 46 38 30 ## 18 45 28 30 ## 19 51 29 21 ## 20 51 26 28 # check your data, it should be a data.frame: class(mydata) ## [1] "data.frame" now spend some time familiarizing yourself with the data. you can do quick descriptive stats and some plots to visualize the data. # calculate mean of v1, the first column of data: mean(mydata$v1) ## [1] 48 # same for v2: mean(mydata$v2) ## [1] 28.2 # or you can use a one-liner with(mydata, mean(v1)); with(mydata, mean(v2)) ## [1] 48 ## [1] 28.2 # calculate the variance, standard deviation, etc with(mydata, var(v1)); with(mydata, var(v2)) ## [1] 48.21053 ## [1] 19>