CIS 3120 FINAL REVIEW SHEET Format: 50 multiple choice questions (Randomized) 2 hours to complete it Blackboard See github repository for code snippets:...

1 answer below »
This exam will be for 2 hours from 1PM to 3PM EST and there will be about 50 multiple choice questions.


CIS 3120 FINAL REVIEW SHEET Format: 50 multiple choice questions (Randomized) 2 hours to complete it Blackboard See github repository for code snippets: https://github.com/avinashjairam/avinashjairam.github.io 25 Questions Will Be From the following: Web Scraping + Data Frames - 60% Data Analytics (Up to the Midterm) + Numpy - 40% Data Analytics - Difference between data and information - Why is there an increased interest in data analytics over the last 5-10 years? - The demand for data analysts will increase tremendously over the next decade - Define Big Data - Explain why data is referred to as the “new oil” - What are some required skills and knowledge necessary to get started in data analytics? - Why is python so popular especially in data analytics? - What are the complex data structures in python? - What is data analysis (See lecture 5) - 5 significant steps for data analysis (See lecture 6) Web Scraping - Define web scraping - Why is web scraping necessary? For example, in which situations would you use web scraping? - Difference between modern day web scraping and that in the 60s-80s? - Challenges to modern data gathering - Solutions to the modern data gathering problem - List some popular APIs https://github.com/avinashjairam/avinashjairam.github.io - What to do before web scraping? - Dangers of web scraping - Ethics of web scraping - Challenges to web scraping - What are some popular web scraping applications? - 2 major entities on WWW - What is HTTP and how it works? E.g. client sends request to server, server responds accordingly - HTML - need to know how to build a basic html page - Know how IDs and Classes are used - What is the web browser module used for? - Be familiar with the request module: how it works, what it is used for? - Know how to make a request to a webpage and print the HTML returned by the server. - What is a get request? - What is beautiful soup? - How to scrape a website with beautiful soup? The methods, steps involved? - How to install beautiful soup? - How to create a beautiful soup object? - What does prettify() do? - Be familiar with the server response codes? 200 and 404, codes starting with a 4 or 5 - All beautiful soup methods, e.g. .children, find, find_all, get_text() - Be able to do basic scraping from a web page - Exporting data from a webpage to a csv file (also covered in dataframes) - Understand how HTML can be organized into a “tree”. E.g. parent tag, child tag, sibling tab, etc - Know how to scrape an HTML table. Be familiar with the example done in class Data Frames - What is a data frame? - What is a series? - How to create a series? - What is pandas? - How to install pandas? - Why use pandas? - How to add columns to a dataframe using lists? - How to create a pandas dataframe from a series? - Exporting data to csv file from a pandas data frame - Be familiar with all the essential pandas methods. All pandas methods listed in lecture #5 - Know how to do all the operations here: https://github.com/avinashjairam/avinashjairam.github.io/blob/master/Basic_Pandas.ipyn b - Differences between pandas and numpy operations: performance. Which situation would be ideal to use numpy, pandas, etc? https://github.com/avinashjairam/avinashjairam.github.io/blob/master/Basic_Pandas.ipynb https://github.com/avinashjairam/avinashjairam.github.io/blob/master/Basic_Pandas.ipynb - How to set the max number of rows/columns in a jupyter notebook - Merge two data frames vertically - info(), iloc(), loc(), describe() - Given a dataframe, select rows based on indexes, row positions. - List all columns in a dataframe - Sort a dataframe based on a column - Select particular columns - What is an index? - How to set the index of a dataframe? (temporarily vs permanently) - List all indexes of a dataframe - How to look up a record by its index? - Sort a dataframe by index - Select elements from a dataframe by filtering, e.g. by a filter mask - Using a filter mask with loc - Filtering on multiple conditions - Negating a filter - Modifying data with a dataframe: how to rename a column; editing data in a row - Numpy - What is numpy? - What is numpy used for? - How to install numpy? - Difference between a python list and a numpy array? - What is an array? - Characteristics of an array- arrays have rank and shape. Define what are those - Know how to create a numpy array from a list of lists - Know how to print elements from a numpy array - Know how to create a numpy array filled with 0s or filled with 1s - Create a 2 D array (think of it as a matrix) of ‘x’ and ‘y’ dimensions filled with random values - Common numpy methods, sum,sort, dot, multiply, 25 Questions Will Be From the following: Relational Databases (5 Questions) - What is a relational database? - What is a table? - Characteristics of a Relational Table - What is a Null Value? When does a Null value occur? - List Keys (All of them) - Define the the keys - Know when to use which key. E.g. when would you use a primary key? - .Given a table, identify which keys should be the primary key, foreign key, composite keys, superkeys, etc. - Identify the alternate key - Why are foreign keys helpful? - Define Referential Integrity - Why is Referential Integrity important? - What can happen if a DB doesn’t enforce referential integrity? - Know what a schema diagram is? - Label a schema diagram? - Define SQL? - Structured Query Language - Know what a Select command in SQL does - Use a select command to select rows and columns from a table - Know what all joins - Match Venn Diagrams with the appropriate joins Data Visualization (5 questions) - Define what is data visualization? State the goals - Why is data visualization important? - Explain the quantitative and qualitative questions asked in data visualization. - Explain spatial and non-spatial data - Identify how data is collected - What factors should be considered when building a visualization model? - Relationship between humans and visualizations in the decision making process - When is a visualization not needed? - What are the possibilities/benefits of having a good visualization model? - Why use an external representation? - Why represent all of the data? - Why focus on tasks and effectiveness? - Why are there resource limitations? - Define what, why, and how? - What are the 3 themes of visual analytics? - Challenges to data visualization - What are examples of data visualization tasks? - What are some libraries used for data visualization? - Using matplotlib - plot line, bar, scatter, histogram plots - How to a add titles, axes labels to a plot (x-axis, y - axis labels) - Using pandas- plot line, bar, scatter, pie plots/charts APIs (10 questions) - Define what an API is - What is an interface? - How does an API work? - How are web based APIs accessed? - What formats are API data returned in? - Method/Types of API authorization/ API security - Label API requests diagram - How do you learn how to send the proper request to a particular API? - Know the difference in which web servers and APIs respond to requests? - Why use an API instead of web scraping? - Restrictions when using an api - Add parameters to a URL to make an API call - Parse JSON - Label diagram - interacting with an API - Know different status codes - What is JSON, what it is used for? - What is the purpose of json.dumps(), json.loads()? Pandas and Numpy (5 questions - Why is data analytics important in marketing? - Join two data frames vertically using .concat(), ignore indexes - Joining two data frames horizontally - Why is it useful to keep the indexes when joining horizontally? - Why is merging data frames useful - What are the types of merges? - Know to perform all types of merges between two data frames? - How to merge two data frames? - Dealing with missing data in a data frame - Know how to use dropna, fillna
Answered 2 days AfterMay 20, 2021

Answer To: CIS 3120 FINAL REVIEW SHEET Format: 50 multiple choice questions (Randomized) 2 hours to complete it...

Gaurav answered on May 23 2021
154 Votes
Online Test Submitted
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here