The map-reduce framework is quite useful for creating inverted indices on a set of documents. An inverted index stores for each word a list of all document IDs that it appears in (offsets in the...








The map-reduce framework is quite useful for creating inverted indices on a set
of documents. An inverted index stores for each word a list of all document
IDs that it appears in (offsets in the documents are also normally stored, but
we shall ignore them in this question).
For example, if the input document IDs and contents are as follows:
1: data clean
2: data base
3: clean base
then the inverted lists would
data: 1, 2
clean: 1, 3
base: 2, 3
Give pseudocode for map and reduce functions to create inverted indices on a
given set of files (each file is a document).Assume the document IDis available
using a function context.getDocumentID(), and the map function is invoked
once per line of the document. The output inverted list for each word should be
a list of document IDs separated by commas. The document IDs are normally
sorted, but for the purpose of this question you do not need to bother to sort
them.


Jun 10, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here