BUSA 3600 Data Challenge Project – 30% of your final grade Fall 2019, last updated 9/26/2019 Overview There are several publicly available data sets available to encourage “citizen data scientists” to...

BUSA 3600 Data Challenge Project – 30% of your final grade

Fall 2019,
last updated 9/26/2019


Overview

There are several publicly available data sets available to encourage “citizen data scientists” to use data to answer questions, investigate problems, or explore relationships. For instance, several questions that could be investigated with the consumer complaint database include:


Which banks provide the best service for mortgages in a specific state? (Service could be defined as having a low number of complaints relative to their total number of customers and providing a timely response to complaints that occur)


Looking at the open-ended responses provided by consumers, which are the top complaints related to mortgages? Do these complaints suggest additional subcategories of complaints that could be added?


How have complaints changed over time?


How does demographic information change the nature of the complaints (compare wealthy vs. poor areas, areas with high level of education vs. low, urban vs. suburban, etc.)?


Project Components and Due Dates

For this project, you will initially
work alone on Part 1, and then will
complete Part 2 with 1 other person. You will demonstrate your knowledge of BI tools, and analyze a data set with a specific question or questions in mind.


Getting Started In-Class Activity – Monday, September 30

Find a suitable and
Primary Fact Table
that is based on a personal area of interest by
Wednesday, Oct 2nd
11:59

. Aim for a table with 5000 rows, and at least 6-8 fields (columns) with both numbers and categorical (text) attributes. Less than 5000 records is “OK”, but fewer than 8 columns will be



Use Google searches to gain access to large data sets that are supportive of your interests, and include in you search terms which may render unique fields (usually categorical) that can be joined with other tables.



For instance, Google “Violent crime statistics by zip code”, “Education outcomes by University ID”, “NFL players by college institution”, “Section 8 housing statistics by FIPS code”.



FIPS codes, GEOID codes, hospital ID, county and State, Vehicle Identification Number (VIN), Airport Code are all unique identifiers. And the .gov websites have 200,000 data sets that you are free to use, as well as others that you may find at work or other sources during your research project.



Avoid using Year as your unique identifier. It will work, but your data set will end up summarized by year. Even with 10 years data, when you join with your dimension table you have an N of 10. There are 4000 unique FIPS codes in the US, 40,000 unique university ID’s…..



The important thing to keep in mind is that you are demonstrating skills learned in class using power pivot. If you search in vain for hours on a topic of interest, change your topic.


  Start the following:


Load data into PowerPivot


Identify questions that would be of interest to you that may be supported


Identify unique fields that could be candidates for table joins with a dimension table




Part 1: Individual Data Model and Metadata – 8% of term grade

Due Sunday, October 27th by 11:59 pm. Accepted late with 20% penalty through Wednesday, October 29th
by 11:59 pm


This is an Individual Assignment! Even if you are already in a group, you will work by yourself for this part of the assignment. You will combine the work from both partners in Part 2.




The Excel file you submit should


Include Primary Fact table


Add at least 4 calculated columns to the Primary Fact table that could be relevant for your reporting.



Add 1 dimension table
of your choice to model. There must be a valid relationship between the table you add and the Primary Fact Table.


You cannot use a data set that I provided for you in class.


Add at least 4 calculated columns to the Dimension table that could be relevant for your reporting.


Create 3 PivotTables.


On each PivotTable, add textboxes describing what the tables mean


On each PivotTable, use appropriate formatting for the data type. For example, if working with money, change to use Currency data type. When displaying averages, show a reasonable number of numbers after the decimal point.


Use conditional formatting on at least 1 of the Pivot Tables


Use sorting on at least 1 of the Pivot Tables


Use filtering on at least 1 of the Pivot Tables




The Word document you submit should


Your research questions that you wish to investigate using your data.


Describe the metadata for your Primary Fact Table, and provide URL for your dataset


Describe the metadata for dimension table you added and provide the URL for your dataset


For the metadata, for each column in your data set list the Column Name, Data type, range of valid values or a sample list of values, and a brief description. For data sets with 1600 columns, I would delete at least 1550 that you will not use. Below is an excerpt of a well documented data set about Universities.















































































Column Name




Data Type




Valid Values




Description



Unit ID



Whole Number



ID > 10000



Universities Identifier



Universities Name



Text



no more than 50 characters



The name of the university



Address



Text



no more than 50 characters



The location of the university



City



Text



no more than 25 characters



City of the location of the university



State



Text



States abbrev (only 2 characters)



States within the Unites States



Zip Code



Whole Number



1 001- 99929



The zip code of the location of the university



School Address



Text



No more than 100 characters



Address of the university



Southern States



Text



True or False



State identifier



Title



Text



No more than 50 characters



The name of the in charge of the school



Public and private



Text



No more than 25 characters



Public and private identifiers



Pell grant value



Whole Number



1-100



Percentage of undergraduates receiving a Pell Grant






May 18, 2022
SOLUTION.PDF

Get Answer To This Question

Submit New Assignment

Copy and Paste Your Assignment Here