Homework 3CS XXXXXXXXXX), Fall 2022, Introduction to Data ScienceDue Date: Nov. 25, 11:59 PM (EST)This homework will focus on the understanding and implementation of stream algorithms we have...

1 answer below »
The assignment below is for Data Science, I did not see any option for Data Science in the Computer Science Category.


Homework 3 CS 301 (001), Fall 2022, Introduction to Data Science Due Date: Nov. 25, 11:59 PM (EST) This homework will focus on the understanding and implementation of stream algorithms we have discussed in the class. 1 Majority Item Problem. The dataset we will use is hw3a.txt, which consists of a series of integral numbers. Please treat it as a data stream. For a given value j, let fj . = |{1 ≤ i ≤ n : xi = j}|, i.e., the frequency of value j in the stream. In the class, we have discussed in detail an algorithm that can output all values j satisfying fj > n/k, where k is a given parameter (known in advance), and n is the size of the data stream (unknown to the algorithm). 1.1 Please run the algorithm as shown in the class and output all values j with fj > n/k where k = 40. Please attach your codes as part of your solution in addition to the final output. (30 points) 1.2 Please run the modified version of the algorithm after swapping the two “Ifs”, which is first to check if there is some empty box and then check if there is some existing box storing the current arriving value. Compare the output here with that in 1.1. Is there any difference between the two outputs? If yes, do we miss some feasible candidate of value j with fj > n/k? (10 points) 2 Reservoir Sampling. We will use the file hw3b.txt as the data stream. For both questions 2.1 and 2.2, please attach your codes as part of your solution in addition to the final outputs. 2.1 In the class, we talk about the original definition of a random sample of a given size without replacement from a data stream. Please follow the definition and output a random sample of size 70 without replacement. (20 points) 2.2 Implement the reservoir sampling algorithm discussed in class to output a random sample of size 70 without replacement. (20 points) 2.3 Run the above two methods each for 1000 times and use the results to calculate the probability of each number being included in the final sample. Show that the results of the two methods are approximately the same. (20 points) 1 332313 765216 618002 8 309476 812518 919441 727774 509704 782733 360408 125312 646967 81989 8 29 612432 8 85308 655115 609811 353512 158737 400338 798485 29577 950047 947594 125312 535876 508506 330334 502168 85681 817811 718872 91478 323057 341255 37 448316 674456 37 212015 238595 37 385109 755245 351413 8 72649 551082 316289 37 29 684053 29 421733 313431 148395 53893 621085 8 415428 430769 623502 273362 835374 37317 148057 37 908991 660218 8 403751 575234 917922 721633 764542 467104 684323 469466 848965 108878 262071 32331 8 253458 37 598449 43717 363856 5264 979549 158921 954325 916761 29 41 896254 894728 8 898574 537648 703725 47897 100492 446618 401332 496529 954975 42001 7085 829033 586439 840238 755245 8 356679 884723 644909 467104 708886 8 581584 37175 8 557487 591633 705381 371077 8 848965 297919 42527 493913 984711 267247 8 461812 598449 73845 49149 32331 702081 729535 412527 329447 623477 141106 960435 72649 639202 37175 8 650444 37 8 782341 238595 343012 37 286492 8 8 264677 428301 555994 418152 285526 893485 850974 149735 217342 728095 287654 974371 919 603782 313013 37 468581 29 6059 169568 42527 602669 37 348577 48296 89461 704057 53711 271437 800708 24453 8 559294 127923 954975 5264 457242 626204 445525 53594 146244 199797 671421 444111 513782 41 893485 67911 956723 860855 681077 472023 29577 729535 38732 883144 730819 165154 471831 975093 51585 337791 521888 98713 29 986203 37 683427 740889 189898 309527 827491 8 79193 799953 225693 99923 276516 351566 782654 244222 91478 808424 840238 442727 924315 8 893485 960097 791211 937862 212412 403751 521888 703725 67297 33363 898463 6059 8 363856 287452 401332 29 709119 639057 971921 714617 301702 147656 733503 6145 8 30891 238595 8 8 33332 548148 130571 551082 26289 412527 8 147025 703095 567161 919685 509704 823696 525916 646635 16653 640591 606146 8 798485 47897 413414 8 755245 46331 37 265616 37317 721633 625952 644909 37 277381 475518 708886 8 19706 626204 169568 29 328796 896254 645742 7132 33108 150549 8 162385 33812 33108 901389 8 353269 460086 263195 8 5723 8 982529 589315 8 641796 8 286537 497177 8 8 265645 809986 588888 962505 67547 968825 358541 8 212015 238595 261845 619025 529382 300616 4112 29 37 850441 130356 343601 8 69814 21836 8 850441 37 602669 276516 173054 920259 534665 446618 313006 735466 37 683427 212817 113737 588888 486799 631792 8 963225 8 855934 8 384507 823696 541367 811295 400338 865939 646967 962505 34438 836655 273362 650506 364496 8 8 148741 71553 80838 765216 67318 8 319008 686387 640591 835067 72607 7132 787717 844516 246397 681086 992324 541367 39605 706798 114542 8 606146 812518 33471 963628 588888 494471 486799 49149 39605 73845 53222 8 43209 650203 8 199621 453702 40191 8 29 356679 101178 146469 8 956723 158363 400338 37 199797 29 5723 612285 62165 353512 688636 951198 717142 37 53492 8 252533 922463 540407 37 37317 650444 989107 622975 433224 509704 51358 30875 37 460866 800708 252533 994403 190359 508506 10212 32223 918055 37 38732 125718 37 734798 14555 714269 736459 8 319169 274565 511372 640591 460866 392791 540407 57119 265616 764542 53222 37 706798 728095 940814 8 42135 599326 29 40191 906758 8 642698 285526 867882 29 127923 779871 423792 711339 59679 568945 924315 550289 469466 37 66329 406079 26289 982529 8 735466 93643 37 29 792834 8 40191 261995 579072 797421 212412 8 43172 960097 3899 34791 942267 644909 8 721633 53917 729535 178081 37 604905 399428 511372 764542 588888 238949 19706 8 313006 280597 225693 8 509704 78426 428115 777686 846884 453344 564745 37 8 301296 8 476155 260091 147656 29 715906 963628 8 901389 173054 289082 839885 256984 412023 8 765697 489094 981837 43638 81465 17535 34098 644909 151344 486799 765216 280597 37 596835 835374 588182 385109 442059 326848 5744 537648 796896 37 549164 722657 37 384014 721633 29 60432 185628 8 963225 171247 106474 54684 41569 580745 669749 392791 529382 253458 987205 37 37 389322 296784 8 644652 716812 148741 53479 870745 446618 523528 29 104316 775346 719549 589315 541367 396761 8 369376 609811 91785 619025 809986 281216 829033 870184 723353 297919 645922 523467 753876 79266 8 564489 674496 984711 623477 464495 29 360408 689831 8 407963 619698 225693 29 67318 851586 385265 790064 567161 101943 152891 951198 8 674658 37 313013 954975 908991 572962 770664 5723 497898 252727 29 626647 14705 956723 8 253458 60981 601597 736225 364496 32331 458545 455048 121863 832546 674658 374205 572528 204666 460086 599326 412527 337989 510829 410291 165154 199621 992248 877445 37 53492 317902 37 8 753876 103597 399769 852535 8 462429 29 269803 26289 8 540407 950047 313006 848811 842909 145048 359538 968825 29 423579 269803 29 581584 603782 62165 485947 30875 753876 29 29 356679 37 238949 327639 793351 29 848648 918104 168205 81113 29577 90729 460086 389546 540407 995324 8 781143 319853 626647 225693 8 312711 369376 870184 809986 508982 8 66329 671421 529382 68485 311119 964478 919377 407963 908991 37 37 736225 244222 285514 73845 458545 707594 877642 51358 719549 79266 384014 802666 548232 14347 29 682242 707594 15075 18619 337791 8 472023 51571 56574 374205 238595 42527 460866 835374 150549 199621 598449 989874 434817 721633 332313 78957 508982 143598 206267 817712 271437 641796 299168 880202 237205 29 65333 8 867882 622171 8 410291 575234 923702 125312 417991 650506 740889 29 8 525916 682457 186962 339125 53594 49302 772769 89858 251003 134429 972108 393276 618254 37 497177 921881 459529 124395 29 37 725745 337989 609811 158737 779871 8 8 575129 493913 423792 67547 533013 246397 8 42527 908991 364496 742684 249002 67297 822825 153503 43638 8 261995 717142 586439 730538 224589 877445 8 536752 158921 37 189898 8 558443 730819 288445 535876 37 89858 683427 356679 599326 3882 918055 673031 121863 640591 8 621085 173643 752307 206778 974371 612285 37 80923 190359 548148 183349 502168 141106 8 8 8 619698 145234 8 345776 735466 992248 37 968825 908991 276516 539212 935997 6091 787728 19706 602037 29 271437 37 8 174854 260327 8 371077 337791 6091 8 614694 341255 348695 462429 8 913757 254073 714269 815936 743694 448316 29 79816 145754 950047 939425 265645 489094 805754 421733 353512 238595 812518 261995 855275 792079 6059 29 643324 54037 537648 351413 553753 37 601597 832546 475518 450824 8 8 150549 8 406079 601597 29 40678 852535 548581 124179 145234 40678 922463 8 472713 84175 8 772152 493913 8 33471 65961 216954 671421 287452 38732 178081 867882 511372 765216 37 371039 8 510829 8 621085 145048 67188 193985 29 287654 29 548232 985948 702081 723353 776544 641796 40678 10212 428301 102156 50221 8 29 8 922463 623477 950047 43638 56432 146469 540407 643138 467104 8 8 655115 418152 80923 186962 8 669749 37 42001 415428 153503 183349 403751 326848 645742 8 539212 268621 249002 988058 914137 48296 434817 29 658382 8 497898 8 450824 733064 65333 8 29 954325 457242 5834 29 508225 420391 8 8 29 43209 971213 619698 742684 37 601597 319169 5264 353269 82402 34098 376209 10212 882721 755245 632606 8 8 813738 8 8 8 579072 351413 29 421733 51358 520536 919441 121863 995268 124179 37 85308 811295 446618 539212 698108 336759 782341 462429 536752 908991 152891 979549 8 981837 73071 913757 641796 8 124179 29 311119 8 850441 25316 29 993752 8 17381 269803 683427 553753 8 461736 533013 73845 29 646967 787717 199797 193985 882112 29 14674 406079 169568 38732 792834 309527 680064 8 722657 515955 883077 216954 13849 34098 539212 548232 33164 60432 15307 34098 621029 947594 285526 37 79193 68485 612285 588888 12666 29 777919 33332 79193 623502 148868 330334 719549 379985 82402 158324 37 204666 705381 34098 8 37 974892 442727 980053 8 8 907858 852535 72607 37 513782 29 169568 843915 918104 53594 337989 8 401332 29 78426 8 4112 8 512262 212817 8 477708 8 51571 37 775346 37317 198195 475243 223164 29 908991 222156 817811 643324 206778 29 89461 428139 8 655115 8 813738 383689 254073 81113 728095 688305 417991 8 8 423792 539212 29 148868 8 740827 171247 400251 812518 292152 581584 935636 14347 257591 37175 622171 764808 375646 337989 37 876044 37 389322 493913 788696 8 954975 53479 604905 37 32331 6091 6679 416836 410291 802666 8 285514 8 128724 148868 907821 974679 684053 984711 8 797421 577242 564675 8 265616 963225 682457 8 24453 508982 700703 592193 429597 273362 69814 346438 253458 29 420391 8 415428 8 225214 809986 300447 850441 728095 558708 67911 154585 91643 20781 734798 83056 91478 845682 643324 511372 538568 464495 765697 743694 297919 10587 320387 602669 937862 771971 399428 8 399428 644652 674456 763646 575234 80175 782341 645742 603782 646967 34438 7221 787728 33812 72607 935122 37 348695 429597 8 51449 455048 29 794626 29 37 17381 918104 126916 6679 152891 127923 917052 401332 808424 924315 717142 287452 714269 642698 420391 882112 37 19706 163595 832546 662877 957227 154585 882112 445525 763646 772152 244222 345776 71041 158737 296812 328796 359538 8 39605 591633 832546 684053 813738 59679 476155 835067 8 677268 8 534665 256984 732528 8 29 41632 80923 781296 592193 8 37 381709 706798 78957 299168 934082 8 545695 584098 265645 37 8 548148 249002 806116 37 791211 689831 770987 84406 475243 612139 204666 429597 799953 37 507194 8 423579 650203 508077 313013 60432 707808 8 83463 994403 8 840238 104316 260327 489094 674496 824053 919377 323057 14674 320387 531188 557487 37 934082 794197 709741 309527 463015 794626 5744 37 29 8 8 459529 727774 951198 269367 606886 57027 429597 8 851586 466039 282924 37 6091 733064 917052 8 548232 8 99589 29 91923 173054 413414 309321 37 865939 8 14705 548148 399769 37 488804 836843 8 299168 8 650506 618002 808424 428139 37 735466 8 867882 829673 300616 29 8 148395 253458 385875 128132 235536 945749 33332 548148 534665 8 8 814985 298895 313431 8 29 29 147025 8 625287 444111 356679 29 980053 4112 836843 114542 309321 640591 442727 8472 674496 822866 171247 29 37 428301 174466 342017 862329 606146 37 717142 575234 635704 893485 742684 178081 508077 684053 717142 407963 375646 645922 37 110277 57027 614694 8 175033 495998 328882 622171 53594 67499 428139 646967 49302 14347 525916 626738 645922 938832 165154 37 916761 403751 14347 71742 446618 459529 8 3882 976813 269803 602669 639057 992324 502168 8 835067 812966 326848 622975 91643 37 8 41021 985948 37 551082 57619 704057 581584 798469 842909 296784 108751 29 37 586439 593913 682457 252727 10254 345776 386531 338611 722856 579072 646967 25326 514909 14555 25316 256984 974679 326848 829673 8 313431 919685 121863 721633 37 783235 870258 850777 701814 69814 619025 80838 273362 937657 34098 8 412023 8 212015 934082 8 29 60432 835067 3899 222156 453344 8 320387 870184 285526 907821 91643 592193 37 199621 12666 29 114542 29 703095 17535 599326 158921 510829 884723 876873 975093 8 640591 626647 8 938832 448316 635704 17421 529382 37 8 190359 212098 437858 223824 29 29 537648 963225 37 562125 320387 688305 722963 217342 51358 32727 37 743694 29 66993 15276 919377 794626 8 442059 283096 635704 825494 29 71553 268621 700939 770664 193985 29 37 433224 536752 136852 81904 39929 508225 442727 733503 148057 867882 313007 190359 8 235536 711339 333963 225214 65961 364496 850974 351566 79816 72607 782654 8 640591 806661 353512 212817 246397 51449 851586 658382 29 41632 71553 736225 580745 704057 445525 212412 8 8 883144 572962 50974 313013 765216 61883 29 337791 206778 8 772769 40678 702081 10212 78154 15276 764808 725745 29 626738 37 771971 772152 384507 281216 940814 798469 776544 37 951198 29 825494 971213 442727 865939 252541 882112 363856 29 37 260622 286492 360408 577242 125718 8 824053 709741 619025 512262 78154 919685 317902 29 860855 16653 975093 389322 37 37 779871 896254 241789 99923 29 464495 802666 33471 353512 64462 80923 128724 446618 29 8 806661 29 641509 988058 964478 876873 703095 33108 8 813738 792834 8 267247 525057 79193 29 83056 52694 676933 126916 507194 219816 555505 8 550289 37 282924 806116 51571 817712 836655 433224 29 37 488804 29 797421 8 360408 609754 8 366641 8 417991 8 684323 8 953977 67911 614694 642698 8 353269 130356 698108 808424 504853 168205 29 212015 37 32331 864095 320387 37317 707594 896254 8 29 564489 61883 244222 271437 25925 763646 216954 286537 297919 460866 682242 802574 753493 143598 297919 28026 520536 8 37 29 674456 145048 3882 235536 735466 212015 513782 885381 29 241789 451222 626647 348695 37 788696 368278 8 29 781143 907858 51449 623223 84175 517622 782341 453702 686387 29 428115 455379 815936 29 918371 8 660218
Answered Same DayNov 19, 2022

Answer To: Homework 3CS XXXXXXXXXX), Fall 2022, Introduction to Data ScienceDue Date: Nov. 25, 11:59 PM...

Aditi answered on Nov 20 2022
49 Votes
CS 301 - HW 3 Solutions
1.
1.1
(
data

=

list(
)
file

=

open(
"hw3a.txt"
,

"r"
)
for

num

in

file.read
().
splitlines
():

data.append
(int(num))
file.close
()

length

=

len
(data)

k

=

40
boxes

=

[
None
]

*

(k
-1
)

counter

=

[
0
]

*

(k
-1
)
for

m

in

range(
0
,

length):

if

data[
m
]

in

boxes:
counter[
boxes.index
(data[
m
])]

+=

1

elif

None

in

boxes:
nIndex

=

boxes.index
(
None
)

boxes[
nIndex
] = data[
m
]

counter[
nIndex
]

+=

1
else
:
for

l

in

range(
0
,

len
(counter)):

counter[
l
]

-=

1
if
counter[
l
] ==
0
:

boxes[
l
]

=

None
for

o

in

range(
0
,

len
(boxes)):

if

boxes[
o
]

!=

None
:
if

data.count
(boxes[
o
])

>

(length/k):

print(str(boxes[
o
]))
)
1.2
(
data

=

list(
)
f
ile

=

open(
"hw3a.txt"
,

"r"
)
for

num

in

f
ile
.read
().
splitlines
():

data.append
(int(num))
file.close
()
)
(
length

=

len
(data)

k

=

40
boxes

=
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here