Project DescriptionYou were given a list of topics. You need to select the topic, collect the dataset from the relevant web-page in the Excel format . The goal of the project is to deliver a detailed analysis of the dataset with a clear research question (your own research question) and an answer to the research question. You are asked to use Excel, Gretl or Eviews for your project as the project should serve as an exercise of how to work with statistical software. Use Microsoft Word to write your project (no hand-written projects, please). The deadline for the project submission is January 14. There is no restriction regarding length of the project, but that does not mean the more the better! Please, follow the required structure of the project. All questions are more than welcome.Structure of the project:1.Introduction (15% of the final grade of the project) – what is the main question of your project, what data are you going to use (in general, very broad description) and how do you plan to find the answer to your research question. The majority of the grade will be devoted to the quality of your research question.2.Part 1: Descriptive Statistics (30% of the final grade of the project) – describe your data in as detailed form as possible3.Part 2: Probability Theory (20% of the final grade)– use the existing data to predict a probability question of your interest (please, if you have more than one, show all!)4.Part 3: Hypothesis testing (30% of the final grade)– here you should present the main hypothesis of the project (= your research question). We will cover during the class how to construct the hypothesis and how to test it (whether to reject/or accept it and how to make conclusions out of that).5.Conclusion (5% of the final grade): restate again the question of your interest, what method did you use to answer the question, what data did you use and what is the answer to your research question.More detailed description:Your first task is to understand the dataset, what variables do you have and what variables are of your interest. That means you know what each variable tells you, in what units it is measured, what type of scale the variable is (nominal, ordinal, etc.) and how well it has been measured.PART 1: Descriptive Statistics1.Understand the dataseta. how many missing observations do you have in the variables of your interest?b. did you need to correct some variables (mistakes/missing observations/...)c. what did you do with the missing observations?d. how do missing variables influence the argumentation/interpretation of your results? Of what type are your variables? (what scale? quantitative/qualitative?)e. what sort of dataset do you have? (cross-sectional, time-series, panel?)
2. Select variables of your interesta. what made you choose the variables of interest ? b. what variables have you left out and why?3.Clean the dataa. check for weird values (could be mistakes)b. code the variable if it is a categorical variable with non-numerical values so that you can use them in the analysisc. provide with detail description of how did you polish your dataset4. Describe the dataa. histogram of all variables of interest + descriptionb. descriptive statistics of all variables of interest + description + comment on all measuresi. describe all measures of central tendency, shape and dispersionii. box-plotiii. comparison of different variables (use other measures, especially measures ofdispersion to contrast the variables)c. graphical visualization (e.g., % of females in board (pie chart), evolution of some variable of interest in time (bar chart), etc.)i. divide your samples into subgroups and describe the differences in subgroups(e.g., does it matter whether the head/director is a woman or a man? you show two graphs of the variables of interest, one for men the second one for women, etc.)d. tables (especially contingency tables) if possiblePART 2: ProbabilityAfter you understand your data, you should come up with an interesting probabilistic question. It is going to be your own judgement about a future event. For example, if you have students’ data in two subsequent years, you may predict what is the probability that students improve their performance? By gender? by race? by country of origin, etc.PART 3: HYPOHTESIS TESTINGYour main goal is to construct a research idea that you can test. An example of a research idea is: (a) there are gender differences among students coming from western countries but (b) there are no gender differences among students from eastern countries. Your task will be to test it and either accept or reject the hypothesis using statistical tests introduced during the classes. You should also mention Central Limit Theorem (how is it applied in your sample).