Data Analytics is a subject that can be best appreciated only when applied to a dataset you are familiarwith. The aim of this project is to achieve that. Do not view this project as ahurdle in the course, rathera bridge to connect the topics you learnt to your work or subject domain.There are five main modules in this course:
- Module 1 :Normal Distribution (Percentile, distribution of means, and chance of occurrence if weassume normal distribution)
- Module 2 : Confidence Interval Estimation (Including Sample Size determination)
- Module 3 : Inferences from data (Hypothesis testing, i.e., confirming or checking if a claim made aboutthe data. In this module, we dealt with only one sample)
- Module 4 :More Inferences from data (Multiple samples)
- Module5 :Regression analysis (Both simple and multiple, apart from basic ANOVA)
Objective
The purpose of the project is for you to apply what you learnt from
at least 4 modulesonyour dataset and make some inferences or estimations.Remember, each Hawkeslearning quiz had 10-15 questions.Here I am asking you to do only 4 tests oranalysis. But the key is – you bring the data and you come up with the question, and each question/set ofanalysis represents something you learnt from the Modules (1-5). There should befour different ones.That is the best waytounderstand the concepts you learnt in this course.If you wish, you can use twodata sources(datasets) to achieve it.It is not necessary all of them have to be done using one dataset.
Data source
There are 3 options, you can choose one of them(there are no restrictions on that)
- Bring your own data from work (you can remove any private or confidential information, forexample:if you are bringing any sales or cost data of an item/product or service – the namecanbe masked)
- Use data from your previous work or company you have access to (again you can remove anyprivate/confidential information)
- Use data from public domain – In today’s world, there is no dearth of structured data. Here aresome places where you can get data from:
Grading Rubric
Total Points (Midterm and Final Report): 55 points
Midterm report:
Point Value:15 points
Due date:April 8th, Thursday
Requirements:
- No more than 1.5 to 2 pages.
- You should describe your source of data (including the data fields you have) and what you want to accomplish based on the topics you learnt.
- You can state the research hypothesis you plan to check, confidence intervals you plan to estimate, or test any relationship between variables you think is important.
- Remember - I need at least your plan based on the first three modules (see examples). No need for analysis, just what you plan to do.
I will provide feedback within 4 days to each of you (if you submit early, you get your feedback early), if I feel any change is needed – I will indicate that.
How are the 15 points given:
- Your Data: 5 points(Note: Remember, the sample size should be at least 30 data points to due any parametric tests)
- Your plan of action: 10 points
Final report :
Point Value: 40 points
Due date:April 30th, Friday
Requirements:
- No more than 1.5 to 2 pages.
- Present the findings using the skillset acquired (topics covered) in class.
- Also include the dataset with the analysis (could be excel or any statistical package). You should provide details of the analysis in an Appendix.
How are the 40 points given: 10 for each Module you choose to apply. (For example, you choose regression to test an association or predict an outcome, you get 10 points for that analysis)
Samples
These are four examples, for each sample – I am showing you how we could use lessons learnt from one or two modules.