i WOULD NEED A WORD DOCUMENT AND MINITAB
Database Project (Database Computer Assignment) Purpose: The purpose of this assignment is for you to find, explore, and explain practical applications of the techniques presented in the seven modules of this course. For six of these module applications, you will use data extracted either from this course’s databases (found in Blackboard) or other databases. A description of the databases associated with this course will be provided at the end of this document. General Guidance: In six of the seven modules are techniques that lend themselves to analysis by Minitab. Specific directions for each module will be given below. In the final manuscript of this project, you will likely need at least a page presentation for each module technique. Included in your work will be a brief statement about the technique, a brief discussion about the data being used (variables, measurement, source of data, etc.), output from Minitab, and a discussion of output including results, meaning and potential use. At the end of the project, you could say a few words about what you have learned overall from doing it. Remember if I get two answers that are the same on anything, I will count them both wrong. Specific Guidance: Chapter 19. Nothing in Decision Analysis can be analyzed or presented using Minitab. Nevertheless, I would like for you to create a decision table of your own. From your own thoughts or ideas generated on the internet, think of a business situation where there are multiple decision alternatives and several possible states of nature. I would prefer that you minimally have at least three of each – but no more than five. Assign your own payoffs (probably make them up). From this, produce a decision table and construct a decision tree. Without using probabilities (decision-making under uncertainty), solve for maximax, maximin, a Hurwicz criterion, and create an opportunity loss (regret) table thereby solving for minimax regret. Now, assign probabilities to the states of nature and solve for the expected monetary value (EMV), expected value with perfect information, and expected value of perfect information. You need not deal with utility. Chapter 10. There are two types of problems in this chapter: 1.) hypothesis tests about the difference in two means using a t test and 2.) hypothesis tests about the difference in two proportions using a z test. You need to do an analysis of each. With the t test of means, find a variable that has two subgroups and use Minitab and the t test to test the difference in the two subgroups on some measure. For example, in the AHA database, compare number of beds per hospital for two different states. Or compare payrolls for two different types of service. Or from the large student database, test to determine if there is a difference in number of miles to class by male vs. female. From the Fifty Largest Banks of the world, compare Assets for China vs. Others. In addition to the databases mentioned above, other database analyses that would lend themselves to hypothesis tests of two population means could include: Consumer Food database – 2 possible IV’s – Region, Location. 3 possible DV’s – Food Spending, Income, Debt Top 100 Retailers – 1 IV – create one from states (e.g. East/West of Miss. River). 2 possible DV’s – Retail Sales, Stores World’s Largest Companies – 1 DV – U.S. or not. 4 DV’s – Sales, Profits, Assets, Mkt. Value. With the z test of proportions, you may have to search the internet for some issue upon which two groups of people, locations, etc. differ on some question and use Minitab to test the difference in proportions. Or, as an example, in the AHA questionnaire, is there a difference in proportion of hospitals that have Births - one state vs another state? You may have to hand count. In each of these, discuss the database used, the variables, and what is being compared. Give the hypotheses. Show the output from Minitab. Explain the results of the output including the sample means or proportions, the sample sizes, the observed z or t value, the p-value, and the decision along with any other pertinent information. Chapter 11. Conduct a one-way analysis of variance (ANOVA) using an independent variable with at least three classification or treatment levels. As part of this analysis, go to Options and select Tukey’s multiple comparisons using a 5% level of significance. Find a dependent variable in one of the databases (a variable with something we can analyze – personnel, assets, miles, tons of pollution, etc.). Now find an independent variable with a breakdown of the dependent variable into subgroups that has three or more subgroups. As an example, find a variable that has a characteristic like industry, geographic locale, occupation, etc.). Now run a one-way ANOVA to determine if there is a significant different between the classifications of the independent variable (e.g. geographic locale = W, E, S, N) on the dependent variable (number of workers, ages, assets, etc.). Discuss the database used, the dependent and independent variables, the classification or treatment levels, and what is being compared. Give the hypotheses. Show the output from Minitab. Explain the results of the output including the sample means, the sample sizes, the observed F value, the p-value, and the decision along with any other pertinent information. If it is appropriate to conduct multiple comparisons, run the comparisons and explain the results. Good candidates for Chapter 11 are: Financial database – one IV – Type of Company. 7 DV’s (revenue, assets, etc.). Manufacturing database – 2 IV’s – Value of Shipments and Industry Group. 6 DV’s including Employees, Value Added, Cost, etc. Inc. 5000 – 2 IV’s – Industry, Years on the List. 2 DV’s - % Growth and Revenue. Student Survey database – 1 IV – Favorite Restaurant. 7 DV’s – Age, Miles, Credit Hours, etc. Chapter 13. Run two different multiple regression analyses using data and variables from the databases. In each, include at least two independent variables. In each analysis, describe the dependent variable measures and the independent variables along with the source of the data (which database did they come from, etc.). Display the Minitab output for each and explain each output. See page 495 of the text. Include in your discussion, the regression equation, the overall F test and its p-value, the t ratios of the predictors, the value of se, the value of R2, and any other relevant information. Some recommendations: AMA Top 50 12-Year Gasoline Agricultural Consumer Food Energy Financial International Labor Manufacturer U.S. and International Stock Market EPA Emissions Multi-Family Metrics Student Survey World’s Largest Companies Chapter 14. Run one simple regression model. Examine and explain the output. Now run it again as a multiple regression model in which in addition to the first variable, you add a second variable that is nonlinear. Do this twice by exploring at least two different ways to recode the data with the first variable (possibilities – square the variable, take the log of the variable, invert the variable, etc.). Now run a multiple regression model with at least two predictors where at least one of the predictors is a dummy variable. Explain the results. Next, conduct a stepwise regression analysis where you have at least 4 predictors. Explain what happened at each step. From this stepwise process, what do you recommend? Shown above in Chapter 13 are the databases that can be used for multiple regression analysis. Some have at least four predictors and some do not. A few of dummy variables. Pick and choose. You may also use data from other sources that you may have or find. Chapter 15. Time-series Forecasting. Several of the associated databases have time series data. These include: 12-Year Gasoline Agricultural Energy International Labor U.S./International Stocks EPA Emissions (data rich) Furniture and Home Furnishings Personal Savings Pick and choose a variety of data as needed for this part of the assignment. Select some time-series data and show a time-series plot from Minitab. Analyze some time-series data using moving averages, weighted moving averages, and exponential smoothing. Compare the results using MAD. Run a trend analysis on some time-series data. Try a quadratic trend approach. Is it better or worse, why? Take some long-term data like one of the vegetables in the agricultural database and determine the seasonal effects. Graph the data. Now de-seasonalize the data. Graph the de-seasonalized data. What happened? Chapter 18. Find or create twenty samples of data measures on some item. Let each sample have six measurements. Now create both an X-bar chart and an R chart from the data. What happened? Print out the graphs. Were there items in either chart that are “out-of-control”? Explain. Now create a p chart of twenty-five samples with each sample have 40 items and a few in each that are out of compliance. Use data that you created. Display the p-chart and discuss if there are any samples that are “out-of-control”. The Databases: 12-Year Gasoline Database The 12-year time-series gasoline database contains monthly data for four variables: U.S. Gasoline Prices, OPEC Spot Price, U.S. Finished Motor Gasoline Production, and U.S. Natural Gas Wellhead Price. There are 137 data entries for each variable. U.S. Gasoline Prices are given in cents, the OPEC Spot Price is given in dollars per barrel, U.S. Finished Motor Gasoline Production is given in 1000 barrels per day, and U.S. Natural Gas Wellhead Price is given in dollars per 1000 cubic feet. Consumer Food Database The consumer food database contains five variables: Annual Food Spending per Household, Annual Household Income, Non-Mortgage Household Debt, Geographic Region of the U.S. of the Household, and Household Location. There are 200 entries for each variable in this database representing 200 different households from various regions and locations in the United States