provide chapter summary for ch 9(refer to syllabus for instructions)
MIS 761: Data Analytics Statistical Methods and Tools Instructor: Andrew Hardin, PhD Full Professor of Information Systems Lee Business School—Office: BEH 339 Email:
[email protected] Home page: Faculty Site Course Description Explores how data analytics relates to the scientific method and the CRISP-DM methodology, and then examines advanced statistical techniques for the contemporary analysis of organizational data. Taking an analytical approach, students will consider the applicability of particular statistical approaches/packages for gaining a business understanding in specific settings. Learning Outcomes By the end of this course, students should be able to understand specific processes associated with the analysis of data sets. Students will learn about the scientific method and the CRISP-DM, two widely utilized frameworks for guiding the data analytics process. Students will also learn about the exploration and integration of data sets prior to their inclusion in comprehensive models. To test and evaluate these models, students will utilize the R statistical package. Required Course Materials [endnoteRef:1] [1: Those that have taken course from me are aware that I typically give students an opportunity to choose from multiple options to help them best meet their learning needs. In this class in particular, there is a large dispersion of skill sets. Therefore, please see the footnotes regarding the textbook/assignment choices for this course. ] The course requires use of the R and the R-Studio integrative development environment. R is open source, and students can download it free here along with R Studio here. Students will use R for both the Andy Field chapter examples and the course project. Discovering Statistics Using R (2012), Andy Field; SAGE: ISBN-13: 978-1446200469 Library Resources: Patrick Griffis Email:
[email protected] Phone: 895-2231 Assignments: Points Percentage Participation (2 - introduction post, 8 - for discussion board posts, 8 - surveys) 18 18% R Packages and Script Examples (5 points each) 10 10% Field In-Chapter Examples and Summary (4 points each) 32 32% Course Project: 40 40% Total 100 Grading Criteria: 93% and above: A At least 73% but less than 77%: C At least 90% but less than 93%: A- At least 70% but less than 73%: C- At least 87% but less than 90%: B+ At least 67% but less than 70%: D+ At least 83% but less than 87% B At least 63% but less than 67%: D At least 80% but less than 83%: B- At least 60% but less than 63%: D- At least 77% but less than 80%: C+ Less than 60%: F Participation: Introduction, Course Questions Discussion Board Postings. Students receive two participation points for their introduction posting. Specifics on the content of the introduction post are included in the introduction discussion area. Due by July 17 Students can receive up to eight additional points for posting helpful hints and answering other students’ questions in the course questions discussion boards. Only frequent contributors will receive the full eight points. If you contribute only occasionally, you will receive 4 points. If you do not contribute at all, you will not receive any points. So please contribute, we are all in this together! Due by Aug 3 Students earn the final eight points by completing four surveys during the course. [endnoteRef:2] [2: Students have the choice to complete the four surveys or complete an alternative assignment. Completing the four surveys should take about 50 minutes (students that choose this option must complete all four surveys to receive credit). The alternative assignment is to research an emerging topic on analytics such as machine learning applications, AI applications, etc. The deliverable is a two-page analysis (not summary) of the topic. Please cite your work in APA format. ] Field In-Chapter Examples and Summary: Students will choose and complete all of the in-chapter examples for eight of the 19 chapters in the Andy Field text.[endnoteRef:3] [endnoteRef:4] Students should choose chapters based on their individual statistical backgrounds. The topics covered rapidly increase in complexity so the materials are suitable for any skill level. This is a graduate level course, so you should challenge yourselves accordingly! Note also that not all of the Andy Field instructions work because of updates to R since the 2012 publication of the textbook. Students can find documentation for these updates on the web (new packages, etc.). My videos demonstrating the Andy Field in-chapter examples also include information on these changes. Important: In addition to completing the in-chapter exercises, students must submit for each chapter a one-page word document summarizing when a particular statistical approach is appropriate, the theory underlying it, and a discussion of its assumptions. Due by July 31 [3: Students who are experts at R and statistical analysis may alternatively complete three Andy Field R chapters, and 10 chapters from the Modeler Application Guide (they are relatively short – IBM provides the guide free). I will provide individual Modeler licenses for home use to students choosing this option. Simply submit two Application Guide chapters for each of the last five submission areas for the Andy Field R assignments. Students may also alternatively complete five Andy Field R chapters and three chapters from Data Mining with SPSS Modeler, Wendler and Grottrup, Springer ISBN 978-3-319-28707-2. I will provide individual Modeler licenses for home use to students choosing this option. Simply submit the completed chapters in the Andy Field in-chapter assignment submission areas. Important: As required for completing the Andy Field R chapters, in addition to completing the in-chapter exercises from the Modeler Application Guide, or Wendler and Grottrup book, students must submit a one-page word document summarizing when the statistical approach is appropriate, the theory underlying it, and a discussion of its assumptions. ] [4: Students who have completed IS 372 must complete at a minimum Andy Field and/or IBM Modeler chapters we did not cover in that course. ] R Packages and Script Examples Discussion Board Postings: Each student is required to post an example of a useful function in the R Packages and Script Examples Discussion area. These two postings are worth up to 10 points. I have provided an example of the minimum requirements for this assignment. Due by Aug 7 Course Project: Utilizing a topic and dataset of their choice, groups will design a study based on the Crisp-DM process, analyze it using R[endnoteRef:5], and then write up the results. See Module 2: Course Project in Canvas for the course project instructions and video examples. Please reach out to me ASAP with questions about the suitability of your chosen topic/dataset. Note that Patrick Griffis (listed above) can be a valuable resource for finding datasets. Important: Due to the shortened summer schedule, teams should begin thinking about their project once the session begins. Do not procrastinate! Due by Aug 14 [5: The course project examples use R. Teams can supplement their R analyses using Rattle for R (freely available as an R package), IBM SPSS Statistics and/or IBM SPSS Modeler, depending on the skill sets of the team members. I have the software to open and run course projects developed using any of these tools. ] Free Loading: To combat instances of freeloading, all students are required to complete a group assessment spreadsheet. I will consider these assessments as I compute final grades for the course. Please notify me if a group member is not participating so I can intervene as necessary, i.e., before it is too late. Failure to submit the team assessment spreadsheet will result in an eight-point deduction from the participation score. Course Calendar Topic Reading Activity/Deliverable Module 1: The Scientific Method, Crisp-DM and Statistics Review Using R Weeks 1-3 The Scientific Method Power Point Video Watch Video Crisp-DM Power Point Video and Crisp-DM Guide Read Crisp-DM Guide, Watch Video Statistics Review Andy Field Text and Videos Complete the in-chapter examples for eight of the 19 chapters. Students can choose any eight chapters. However, if selecting chapters that are more advanced, students should briefly review the earlier chapters to insure they are comfortable with basic statistical principles. Due July 31 Module 2: Course Project Weeks 4-5 Course Project Andy Field Text and Videos Although the course dedicates the last two weeks to the course project, groups should begin working on their respective projects at the beginning of the course. Due August 14 Course Policies: Email, Discussion Board Responses: I will generally respond no later than 24 hours after receiving an email or discussion board posting notice. In most cases it will be much sooner. I am happy to review individual projects, etc. Please note, however, that I will need the dataset, etc. so that I can run your analysis on my machine. For questions that may concern the rest of the class, please post them in the discussion boards. As noted elsewhere in the syllabus, I am awarding participation points to students that post helpful hints and/or respond to other students’ posts. Late Assignments: Late assignments are permissible only in cases of unavoidable personal or family emergencies and the student must notify me as soon as possible. In all other cases, there will be a significant reduction in points for late assignments. Grade Appeals: If you believe there was a mistake made in the grading of your assignment, please notify me promptly and I will determine whether to review the assignment. University Policies: See information by following Syllabus link in Canvas Endnotes: t ·t )ISCOVERING STATISTICS JSING R SUB Hamburg 11111111111111111 IllllllIlllIlll 11I11 IIllllllll Illl B/117107 )ISCOVERING STATISTICS JSING R NDY FIELD I JEREMY MILES I ZOE FIELD ISAGE Los Angeles I London I New Delhi SingapoI'e I Wa<>hington DC I - © Andy Field, Jeremy Miles and Zoe Field 2012 First published 2012 Reprinted 2012 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the