"""# HW5: MapReduce While mapreduce is an important concept in large scale parallel systems it is possible to simulate it locally and on small practice problems. To this end we provide a simple...

1 answer below »
"""# HW5: MapReduce
While mapreduce is an important concept in large scale parallel systems it is possible to simulate it locally and on small practice problems. To this end we provide a simple framework that allows you to run and test map/reduce functions below. This assignment asks you to write the code for two, two-stage mapreduce problems.
"""
from collections import defaultdictimport stringimport numpy as np
def simple_mapreduce(map_fn, reduce_fn, kvin): keyvalues = defaultdict(list) for k, v in kvin: for outk, outv in map_fn(k, v): keyvalues[outk] += [outv] kvout = [] for k, vs in keyvalues.items(): for outk, outv in reduce_fn(k, vs): kvout += [(outk, outv)] return kvout
"""
The input given to you describes a dataset of nodes connected to each other via edges. An example of such a network could be the Twitter follower network, in which the users denote nodes and two nodes are connected by an edge if one follows the other.
* The **degree** of a node in a network is the number of connections it has to other nodes. * The **degree distribution** is the distribution of these degrees over the whole network.
As an example, you can assume that the input you are given is an edgelist in the form of a table as shown below. This table shows pairs of edges connected to each other and thus capture the graph shown on the right:
Source | Destination -------|-------------A | BA | CA | D B | C
The degree distribution then is Degree | Count-------|-------1 | 12 | 23 | 1As you can see, we have one nodes with degree 1 (which is D), two node with degree 2 (which is B and C), and one node with degree 3 (which is A). Thus, the degree distribution captures, for each value of degree, the number of nodes that have that degree. Below you can see the network mapped out"""
%matplotlib inline import matplotlib.pyplot as pltplt.axis("off")plt.scatter( [0, 0, 1, 0], [1, 0, 0, -1], s=1000, marker='o')# Annotate with text + Arrowplt.plot([0, 0], [-1, 1], c="r", zorder=-1)plt.plot([0, 1], [0, 0], c="r", zorder=-1)plt.plot([0, 1], [-1, 0], c="r", zorder=-1)
plt.annotate('A', xy=(0, 0), xytext=(0, 0), c="w")plt.annotate('B', xy=(1, 0), xytext=(1, 0), c="w")plt.annotate('C', xy=(0, -1), xytext=(0, -1), c="w")plt.annotate('D', xy=(0, 1), xytext=(0, 1), c="w")
#=-------------------------
network_dataset = [ ('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'),]
network_expected = [ (3, 1), (2, 2), (1, 1)]
# you are to write the body of the following functions: (replace the `pass`'s with your code
# k=source node, v=destination nodedef q2_map1(k, v): yield k, len(v) def q2_reduce1(k, values): yield k, sum(values) def q2_map2(k, v): pass
def q2_reduce2(k, values): pass
first_stage = simple_mapreduce(q2_map1, q2_reduce1, network_dataset)network_final = simple_mapreduce(q2_map2, q2_reduce2, first_stage)network_final
sorted(network_expected) == sorted(network_final)
Answered 62 days AfterNov 18, 2021

Answer To: """# HW5: MapReduce While mapreduce is an important concept in large scale parallel systems it is...

Karthi answered on Jan 19 2022
117 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here