The goal of this project is to use the concepts taught in this course to develop an efficient way of working with Big Data. You should have 2 files in your Linux system: hugefile1.txt and...

1 answer below »

The goal of this project is to use the concepts taught in this course to develop an efficient way of working with Big Data.

You should have 2 files in your Linux system:hugefile1.txtandhugefile2.txt, with one billion lines in each one. If you do not, please go back to the Module 7 Portfolio Reminder and complete the steps there.

Create a program, using a programming language of your choice, to produce a new file:totalfile.txt, by taking the numbers from each line of the two files and adding them. So, each line in file #3 is the sum of the corresponding line inhugefile1.txtandhugefile2.txt.

For example, if the first 5 lines of your files look as follows:

$head -5 hugefile*txt

==> hugefile1.txt

4131

29929

6483

7659

25003

==> hugefile1.txt

8866

19171

11029

4889

27069

then the first 5 lines oftotalfile.txtlook like this:

$head -5 totalfile.txt

12997

49100

17512

12548

52072

Because the files of such large sizes cannot be read into memory in their entirety at the same time, you need to use concurrency. Reading the files one line at a time will take a long time, so use what you have learned in this course to optimize this process. Be sure to record the amount of time it takes for each version of your program to complete this task.

Create two programs, where one program reads the first half of the files, and another program reads the second half. Use the OS to launch both programs simultaneously.

Now, break uphugefile1.txtandhugefile2.txtinto 10 files each, and run your process on all 10 sets in parallel. How do the run times compare to the original process?

Answered Same DayJul 05, 2022

Answer To: The goal of this project is to use the concepts taught in this course to develop an efficient way of...

Jahir Abbas answered on Jul 06 2022

87 Votes

SOLUTION.PDF

The goal of this project is to use the concepts taught in this course to develop an efficient way of working with Big Data. You should have 2 files in your Linux system: hugefile1.txt and...

Answer To: The goal of this project is to use the concepts taught in this course to develop an efficient way of...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment