Posted on: Wednesday, 23 May 2018 2:13:58 PM AEST
Dear R coders:
A sample report from this course last year is provided to help your understanding and writing. You can find it next to the Task 2 instruction on the blackboard. Please read the message there carefully.
In addition, based on questions from some students, I'd like to give a summary about the inquires for the assignment.
1. The assignment requires one-variable or two-variable analysis. The key question here is what is a variable? A variable is something that is changeable, and we need to define it before we can perform the analysis. The assignment has no restrictions on how to define a variable. For example, we can define a variable as
"the unemployment rate", then it's a variable containing a sequence of numbers across many years.
However, since it is one-variable, we only see a set of nunmbers, without any year information.
2. Once you have a clear definition of a variable, you need to pick up the data cells for further graphing and analysis. You can do cell selection inside R. Or to make it simpler, you can use Excel to select the data cells about the variable, and save them to a new csv file, and read the file into R later. Either method is ok for the assignment.
3. Some failed to get the plots as expected. This is mainly due to the data frame structure used in R. For example, I decided to select the unemployment row. Even if I just select one row, and it looks like a sequence of numbers, it is in fact a data frame. As a data frame, each column is considered as a vector or variable. And this unemployment consists of multiple variables/vectors. If I plot this row directly, I will not get the chart I'm looking for.
... Series Name Series Code 1995 [YR1995] 1996 [YR1996] 1997 [YR1997] 1998 [YR1998] 1999 [YR1999] 2000 [YR2000] 2001 [YR2001] 2002 [YR2002] ......
Unemployment, total (% of total labor force) (modeled ILO estimate) SL.UEM.TOTL.ZS 8.5 8.5 8.399999619 7.699999809 6.900000095 6.300000191 ...
The solution is to:
- Transpose the data frame, or
- Create a pure vector of numbers from the row, or
- Use Excel to select the cells and create a new csv in the correct data frame format, or
- etc...
4. For clustering and linear regression, you also need to clearly define the variables you are studying. Defining the problem is always the first step for any data analysis.
Enjoy the writing!
Ming