DeVry University Data Mining and Analytics Course Project Introduction For your course project, you will complete a data analysis project using the Jupyter Notebook used in the course lessons as a...

1 answer below »
Part 2


DeVry University Data Mining and Analytics Course Project Introduction For your course project, you will complete a data analysis project using the Jupyter Notebook used in the course lessons as a model. You will select your own data for your analysis project. It may come from one or more sources. See the Resources area for ideas and places to find data for analysis. The project will be in the form of a Jupyter Notebook file. It will also include support files such as the .csv original data, database files for data storage, .html report files, your environment file, and any other files needed to duplicate your project. You may use your own computer for your project, or you can use the Azure Labs virtual machine provided for you. Resources This section includes resources for getting ideas for your project. It also includes resources that can help with using Python and the various libraries and tools related to data science. Project Ideas and Finding Data One of your first steps will be to decide on a dataset or datasets for your project. There are many places to look for data for your course project. A few are listed below but feel free to search the web for other ideas. Think about data that you might be interested in, or that might be helpful for your organization. When considering your data, think about the capabilities of the computer you will be using for your data analysis (whether your own computer or the Azure Labs virtual machine). Be sure to do a few initial tests of loading and doing some simple processing of data to make sure your environment is suitable for the size of your dataset. You can always reduce the size of your dataset if you need to. USAFacts – Large source of US government data. https://usafacts.org/ Kaggle Datasets – Over 59,000 public datasets for use. https://www.kaggle.com/datasets OpenML – Open Machine Learning. Includes over 21,000 data sets you can use. https://www.openml.org/home Microsoft Research Open Data – Free datasets from Microsoft in the areas of biology, computer science, earth science, education, healthcare, information science, mathematics, physics, social science, and other. https://msropendata.com/categories Find Free Public Data Sets for Your Data Science Project – Article with over 30 sites with public data for your project. https://www.springboard.com/blog/free-public-data-sets-data-science-project/ 21 Places to Find Free Datasets for Data Science Projects – Article on sources of free data for data science projects. https://www.dataquest.io/blog/free-datasets-for-projects/ Documentation, Tutorials, Guides You will find many great references and tutorials for virtually any part of your project. A few are listed below but be sure to do a web search and explore YouTube.com and other resources if you need ideas, examples, walkthroughs, tutorials, documentation, etc. Python and Basic Libraries Python for Beginners – This course from Microsoft consists of 44 short videos on various Python concepts. Use this to refresh your Python skills or to review specific topics. https://www.youtube.com/watch?v=jFCNu1-Xdsw&list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6 Python Documentation – Official Python guides. Includes beginner’s guides, complete documentation, and tutorials. https://www.python.org/doc/ w3schools.com Python – This site from the popular w3schools.com has tutorials, examples, and references for Python, NumPy, Matplotlib, SciPy, Machine Learning, and more. https://www.w3schools.com/python/ NumPy Documentation – Official NumPy documentation including quickstart tutorials, references, examples, and more. https://numpy.org/doc/stable/ Pandas Documentation – Official Pandas documentation including getting started, user guides, developer documentation, and more. https://pandas.pydata.org/docs/ Pandas Cheat Sheet – Quick reference for Pandas https://www.dataquest.io/blog/pandas-cheat-sheet/ Matplotlib – Official Site for Matplotlib. Includes examples and documentation for your data visualization. https://matplotlib.org/ SciPy – Official Site for SciPy. Includes getting started, documentation, examples, and more. https://www.scipy.org/ SQLite – Official site for SQLite. https://www.sqlite.org/index.html Data Preprocessing using Pandas – Simple tutorial for preprocessing data using Pandas. https://www.analyticsvidhya.com/blog/2020/09/pandas-speed-up-preprocessing/ Visual Studio Code Visual Studio Code Documentation – Official Visual Studio Code documentation. Includes setup guides, getting started, user guides, languages, and more. https://code.visualstudio.com/docs Anaconda and Jupyter Anaconda Documentation – Official Anaconda documentation. Includes installation, user guides, references, and more. https://docs.anaconda.com/ Managing Packages (Libraries) in Anaconda https://docs.anaconda.com/anaconda/user-guide/tasks/install-packages/ Jupyter Documentation – Official Jupyter documentation. https://jupyter.org/documentation Additional Tools and Libraries Pandas Profiling – Create reports based on your data. https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/ Folium – Create maps based on your data. https://python-visualization.github.io/folium/ SandDance for Visual Studio Code – Allows you to easily visualize your raw or processed .csv files. https://marketplace.visualstudio.com/items?itemName=msrvida.vscode-sanddance Video Tutorial on SandDance – Video from Microsoft on using SandDance for data visualization. https://www.youtube.com/watch?v=ID5JOc73h4M Python Seaborn Tutorial for Beginners – Simple tutorial for using Seaborn for Data Visualization. https://www.datacamp.com/community/tutorials/seaborn-python-tutorial Data Science Complete Jupyter Notebook for Data Science – Complete video tutorial on doing a data science project with Jupyter Notebooks. https://www.youtube.com/watch?v=8O_COC9xtJw Data Science for Beginners – Series of short video tutorials on Python, Anaconda, and Data Science. https://www.youtube.com/watch?v=JL_grPUnXzY&list=PLeo1K3hjS3us_ELKYSj_Fth2tIEkdKXvV Data Science in Visual Studio Code – Simple tutorial for doing a data science project in Visual Studio Code. https://code.visualstudio.com/docs/python/data-science-tutorial Setting up a Data Science workspace with Visual Studio Code and Anaconda – Simple tutorial for setting up your workspace. (Requires login with e-mail or Google account) https://towardsdatascience.com/setting-up-your-own-data-science-workspace-with-visual-studio-code-and-anaconda-python-22237590b4ed Python – Data Science Tutorial - https://www.tutorialspoint.com/python_data_science/index.htm Introduction to Time Series Analysis in Python https://www.kdnuggets.com/2020/09/introduction-time-series-analysis-python.html Exploratory Data Analysis with Pandas Profiling – Tutorial for working with Pandas Profiling to generate a report using your data. (Requires login with e-mail or Google account) https://towardsdatascience.com/exploratory-data-analysis-with-pandas-profiling-de3aae2ddff3 Making 3 Easy Maps With Python – Simple tutorial for using Folium to create maps. (Requires login with e-mail or Google account) https://towardsdatascience.com/making-3-easy-maps-with-python-fb7dfb1036 Visualizing Data at the Zip Code Level with Folium – Tutorial for using Folium to create zip code level maps. (Requires login with e-mail or Google account) https://towardsdatascience.com/visualizing-data-at-the-zip-code-level-with-folium-d07ac983db20 Mapping Data with Folium – Another tutorial for using Folium to create maps. https://medium.com/@sosterburg/mapping-data-with-folium-356f0d6f88a9 Part 1 – Environment Setup and Selection of Project Data Summary In this part of the project, you will set up your data analysis environment and select your project data. Points: 60 Due: Module 1 Deliverables: PDF of Jupyter Notebook project. Zipped project folder. Steps 1. Select one or more data files for your project. See the section Project Ideas and Finding Data above. Download the file(s) (be sure to remember where you downloaded the files to. You may need to convert your data to a format suitable for your project (such as .csv) from another format such as JSON or XML. 2. Decide where to host your project. You can host your project on your own computer or in the MS Azure Labs virtual environment. a. You have access to a Microsoft Azure Labs virtual machine. This VM has MS Office, Anaconda, and Visual Studio. It also has an Anaconda environment available that is used to run the Jupyter Notebook used in the lessons. You can use this environment and add additional libraries if necessary or create your own. b. You can use your own computer. Anaconda and Visual Studio are available for Windows, Mac, and Linux operating systems. The Anaconda environment is available in the course Files area and you can import this to get started or you can create your own environment. 3. Set up your environment. a. Install any software such as Anaconda, Visual Studio Code, etc. (if necessary – these are pre-installed in the Azure VM). b. Install any Extensions (if using Visual Studio Code) (if necessary – these are pre-installed in the Azure VM). Recommended: Python. Anaconda Extension Pack (may be automatically installed with Anaconda). SQLite. SandDance for VSCode. c. Create Anaconda Environment (if necessary). You can do this by importing the CEIS480 environment available in the Files area of the course (this environment is pre-installed in the Azure VM). Note, the CEIS480 Environment in the files area will work on Windows. If you are using a different platform you can create your own environment and add your libraries to it. You may also create your own environment from scratch. 4. Create your project folder. a. Create a folder for your project in a suitable location on your computer or in the Azure VM. Be sure to remember where your project folder is so that you can find it again. b. Have a plan to back up your project folder on a regular basis so that you will not lose work if something happens to your project files. c. On Windows, Jupyter Labs can only access files on your C: drive. Visual Studio can access Jupyter Notebook folders on any drive. 5. Create your Jupyter Notebook file in your project folder. a. Make Sure you select the correct environment for your project to run in. 6. Add your project data file to your project folder. 7. Create a markdown cell in your project with your project heading. Include: Course Number, Course Name, Course Session (Month and Year), Student Name, Project Name. 8. Create a markdown cell in your project with a brief description of your project including what type of data is being analyzed as well as the source of the data. 9. Add cells (markdown cell for step/explanation and code cell) for your import statements. 10. Add cells (markdown cell for step/explanation and code cell) to load your project data file. 11. Add cells (markdown cell for step/explanation and code cell) to preview the data using the head() method. 12. Run all cells. 13. Export your environment so that it can be duplicated on another computer. a. Go to the Anaconda prompt. b. Activate your environment. Example conda activate CEIS480 c. Export your environment. Example conda env export > myenvironment.yaml (you can use any filename with a .yaml extension for your environment name). d. Copy the file your project folder (if you cannot find the file, search for it in Windows Explorer). 14. Submit Deliverables (pdf and zipped project folder). Deliverables You should submit both a .pdf file of your project as well as your zipped complete project folder. 1. Export your Jupyter Notebook file to .pdf format (see resources below if you do not know how to do this). 2. Zip your complete project folder (see resources below if you do not know how to do this). Include all files needed to run the project. This should include your environment (.yaml) file, your notebook (.ipynb) file, and any data files needed for the project to run. 3. Submit the following to the dropbox for this module: a. Project .pdf file b. Project folder .zip file Resources ·
Answered 35 days AfterMar 10, 2022

Answer To: DeVry University Data Mining and Analytics Course Project Introduction For your course project, you...

Sathishkumar answered on Mar 12 2022
106 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here