need help with a python assignment dealing with data visualization2 Version 1 Grading...

Question

need help with a python assignment dealing with data visualization2              Version 1  Grading and Feedback   The maximum possible score for this homework is 100 points.    We will auto-grade all questions (except Q1.2d) using the Gradescope platform. Based on our  experience, students (you all!) benefit from using Gradescope to obtain feedback as they work on this  assignment. Keep the following important points in mind:    1. Every student will receive an email within the next 48 hours of the HW release inviting you to use  Gradescope for all HW1 questions.  If you did not receive the email, it can take up to 48 hours from  when we sync the roster. You can still get to Gradescope directly through Canvas.  2. You may upload your code periodically to Gradescope to obtain feedback for your code. This is  accomplished by having Gradescope auto-grade your submission using the same test cases that  we will use to grade your work. The test cases’ results may help inform you of potential errors and  ways to improve your code.   3. Gradescope should not be the primary way to test your code’s correctness, since it provides only a  few test cases, and error messages may not be as informative as local debuggers.  You should test  your code locally to more efficiently and effectively test your code, and only use Gradescope as a  "final" check.  4. Gradescope cannot run code that contains syntax errors. If Gradescope is not running your code,  before seeking help, verify that:  a. Your code is free of syntax errors (by running it locally)  b. All methods have been implemented  c. You have submitted the correct file with the correct name  5. When many students use Gradescope simultaneously, it may slow down or fail to communicate  with the tester. It can become even slower as the submission deadline approaches. You are  responsible for submitting your work in time.  Download the HW1 Skeleton before you begin.  Homework Overview  Vast amounts of digital data are generated each day, but raw data are often not immediately “usable”. Instead,  we are interested in the information content of the data: what patterns are captured? This assignment covers  a few useful tools for acquiring, cleaning, storing, and visualizing datasets.  In Question 1 (Q1), you will collect data using an API for The Movie Database (TMDb). You will construct a  graph representation of this data that will show which actors have acted together in various movies, and use  Argo Lite to visualize this graph and highlight patterns that you find. This exercise demonstrates how  visualizing and interacting with data can help with discovery.  In Q2, you will construct a TMDb database in SQLite, with tables capturing information such as how well each  movie did, which actors acted in each movie, and what the movie was about. You will also partition and  combine information in these tables in order to more easily answer questions such as "which actors acted in  the highest number of movies?".  https://www.gradescope.com/ https://poloclub.github.io/cse6242-2021spring-online/hw1/Y7b5hemF5P_hw1.zip   3              Version 1  In Q3, you will visualize temporal trends in movie releases, using a JavaScript-based library called D3. This  part will show how creating interactive rather than static plots can make data more visually appealing,  engaging and easier to parse.  Data analysis and visualization is only as good as the quality of the input data. Real-world data often contain  missing values, invalid fields, or entries that are not relevant or of interest. In Q4, you will use OpenRefine to  clean data from Mercari, and construct GREL queries to filter the entries in this dataset.  Finally, in Q5, you will build a simple web application that displays a table of TMDb data on a single-page  website. To do this, you will use Flask, a Python framework for building web applications that allows you to  connect Python data processing on the back end with serving a site that displays these results.  Q1 [40 points] Collect data from TMDb and visualize co-actor network  Q1.1 [30 points] Collect data from TMDb and build a graph    For this Q1.1, you will be using and submitting a python file. Complete all tasks according to the instructions  found in submission.py to complete the Graph class, the TMDbAPIUtils class, and the two global  functions. The Graph class will serve as a re-usable way to represent and write out your collected graph  data. The TMDbAPIUtils class will be used to work with the TMDB API for data retrieval.      NOTE: You must only use a version of Python ≥ 3.7.0 and

Sandeep Kumar · Accepted Answer

import http.client
import json
import csv
#############################################################################################################################
#
# All instructions, code comments, etc. contained within this notebook are part of the assignment instructions.
# Portions of this file will auto-graded in Gradescope using different sets of parameters / data to ensure that values are not
# hard-coded.
#
# Instructions:  Implement all methods in this file that have a return
# value of 'NotImplemented'. See the documentation within each method for specific details, including
# the expected return value
#
# Helper Functions:
# You are permitted to write additional helper functions/methods or use additional instance variables within
# the `Graph` class or `TMDbAPIUtils` class so long as the originally included methods work as required.
#
# Use:
# The `Graph` class  is used to represent and store the data for the TMDb co-actor network graph.  This class must
# also provide some basic analytics, i.e., number of nodes, edges, and nodes with the highest degree.
#
# The `TMDbAPIUtils` class is used to retrieve Actor/Movie data using themoviedb.org API.  We have provided a few necessary methods
# to test your code w/ the API, e.g.: get_move_detail(), get_movie_cast(), get_movie_credits_for_person().  You may add additional
# methods and instance variables as desired (see Helper Functions).
#
# The data that you retrieve from the TMDb API is used to build your graph using the Graph class.  After you build your graph using the
# TMDb API data, use the Graph class write_edges_file & write_nodes_file methods to produce the separate nodes and edges
# .csv files for use with the Argo-Lite graph visualization tool.
#
# While building the co-actor graph, you will be required to write code to expand the graph by iterating
# through a portion of the graph nodes and finding similar artists using the TMDb API. We will not grade this code directly
# but will grade the resulting graph data in your Argo-Lite graph snapshot.
#
#############################################################################################################################
class Graph:
    # Do not modify
    def __init__(self, with_nodes_file=None, with_edges_file=None):
        """
        option 1:  init as an empty graph and add nodes
        option 2: init by specifying a path to nodes & edges files
        """
        self.nodes = []
        self.edges = []
        if with_nodes_file and with_edges_file:
            nodes_CSV = csv.reader(open(with_nodes_file))
            nodes_CSV = list(nodes_CSV)[1:]
            self.nodes = [(n[0], n[1]) for n in nodes_CSV]
            edges_CSV = csv.reader(open(with_edges_file))
            edges_CSV = list(edges_CSV)[1:]
            self.edges = [(e[0], e[1]) for e in edges_CSV]
    def add_node(self, id: str, name: str) -> None:
        """
        add a tuple (id, name) representing a node to self.nodes if it does not already exist
        The graph should not contain any duplicate nodes
        """
        node = (id, name)
        if node not in self.nodes:
            self.nodes.append(node)
        return None
    def add_edge(self, source: str, target: str) -> None:
        """
        Add an edge between two nodes if it does not already exist.
        An edge is represented by a tuple containing two strings: e.g.: ('source', 'target').
        Where 'source' is the id of the source node and 'target' is the id of the target node
        e.g., for two nodes with ids 'a' and 'b' respectively, add the tuple ('a', 'b') to self.edges
        """
        edge = (source, target)
        reverse_edge = (target, source)
        if edge not in self.edges and reverse_edge not in self.edges:
            self.edges.append(edge)
        return None
    def total_nodes(self) -> int:
        """
        Returns an integer value for the total number of nodes in the graph
        """
        return len(self.nodes)
    def total_edges(self) -> int:
        """
        Returns an integer value for the total number of edges in the graph
        """
        return len(self.edges)
    def max_degree_nodes(self) -> dict:
        """
        Return the node(s) with the highest degree
        Return multiple nodes in the event of a tie
        Format is a dict where the key is the node_id and the value is an integer for the node degree
        e.g. {'a': 8}
        or {'a': 22, 'b': 22}
        """
        degree_dict = {}
        for n in self.edges:
            degree_dict[n[0]] = degree_dict.get(n[0], 0) + 1
            degree_dict[n[1]] = degree_dict.get(n[1], 0) + 1
        max_degree = max(degree_dict.values())
        # max_deg_nodes = {(k, v) for k, v in degree_dict.items() if v == max_degree}
        max_deg_nodes = dict(filter(lambda x: x[1] == max_degree, degree_dict.

2 XXXXXXXXXXVersion 1 Grading and Feedback The maximum possible score for this homework is 100 points. We will auto-grade all questions (except Q1.2d) using the Gradescope platform. Based on our...

Answer To: 2 XXXXXXXXXXVersion 1 Grading and Feedback The maximum possible score for this homework is 100...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment