Reading and Writing Data In this assignment, you will be reading and writing data. Yes, finally some data science (or at least some exploratory data analysis)! In the week_10 assignment folder, there...

1 answer below »

Reading and Writing Data

In this assignment, you will be reading and writing data. Yes, finally some data science (or at least some exploratory data analysis)! In the week_10 assignment folder, there are three data files named:


data.csv
data.json
data.pkl
These are three common file formats. You can run the following on the bash command line to see what is in each file (this will not work from a Windows prompt but will work in git bash):


head data.csv
head data.pkl
head data.json
You'll see that there is some method to the madness but that each file format has its peculiarities. Each file contains a portion of the total dataset that altogether comprises 100 records, so you need to read in all of the files and combine them into some standard format with which you are comfortable. Aim for something standard where each "row" is represented in the same format. Name this object that contains the data for all three files combined full_data(Natali: Please convert everything to CSV and answer the questions accordingly)




Questions to answer

After you've standardized all of the data, report the following information:




  • What are the unique countries in the dataset, sorted alphabetically? Write to a new file called question_1.csv.

  • What are the unique complete email domains in the dataset, sorted alphabetically? Write to a new file called question_2.csv.

  • What are the first names of everyone (including duplicates) that do not have a P.O. Box address, sorted alphabetically? Write to a new file called question_3.csv.

  • What are the full names of the first 5 people when you sort the data alphabetically by country? Write to a new file called question_4.csv.

  • What are the full names of the first 5 people when you sort the data numerically ascending by phone number? Write to a new file called question_5.csv.

We will be using a script to examine and grade your .csv files soplease make sure:



The answers are all in one column with one list item per cell, sorted as stated in the question.I.e., looking at the .csv in a spreadsheet editor like Google Sheets, all answers would be in the 'A' column, with the first entry in A1, the second in A2, etc.
Please do not include a header; just the answers to the questions.
It is strongly recommended that you open each .csv file to ensure the answers are there and displayed correctly!
Don't include quotes around the list items. I.e., strip the leading and trailing quotes, if necessary, from items when you write to the .csv files. For example, a list entry should look like Spain rather than "Spain". One exception: Some country names do contain commas and it is ok to have quotes: "" around just those country names so that they will be in one cell in the .csv.
In addition,show all of your work in a Jupyter notebook.




Assumptions

You might have to make decisions about the data. For example, what to do with ties or how to sort the phone numbers numerically.
Write your assumptions in this Jupyter notebook at the top of your code under the heading below that says ASSUMPTIONS
Please do some research before making an assumption (e.g. what is a domain name?); put your notes inside that assumption so we can understand your thought process.
NOTE: If you don't know what an email domain is - do some research and write what you found in your assumptions; there is a correct answer to this question!
Answered Same DayNov 02, 2021

Answer To: Reading and Writing Data In this assignment, you will be reading and writing data. Yes, finally some...

Sampad Swarup answered on Nov 02 2021
134 Votes
Order_ID_70656


In [1]:

import re
import csv
import json
import pickle
from time import localtime

In [2]:

# Opening and reading
open_csv = open('data.csv','rt')
read_csv = csv.reader(open_csv)
#Storing value into an array
csv_file_array = []
for row in read_csv:
csv_file_array.append(row)
#Creating a dict from the arry
csv_dict = {}
mid_dict = {}
counter = 1
for i in csv_file_array[0][1:]:#taking value from the 1st row meaning columns name row
for j in csv_file_array[1:]: # taking value from from row othere then the 0th row
mid_dict[int(j[0])] = j[counter] #Creating the nested dict
csv_dict[i] = mid_dict #Creating the main dict
mid_dict = {}
counter+=1
csv_dict

Out[2]:
{'Name': {0: 'Hillary Benton',
1: 'Morgan Y. Little',
2: 'Camden Z. Blair',
3: 'Alexandra E. Saunders',
4: 'Hanae P. Walsh',
5: 'Jescie Sargent',
6: 'Kessie Morgan',
7: 'Bevis M. Santos',
8: 'Flynn Alston',
9: 'Charles F. Crawford',
10: 'Cairo Wolfe',
11: 'Elijah Myers',
12: 'Thane Burch',
13: 'Katelyn Munoz',
14: 'Genevieve Holland',
15: 'Wesley Z. Sharp',
16: 'Ta
tyana H. French',
17: 'Meredith F. Clayton',
18: 'Rajah Carrillo',
19: 'Gabriel Richmond'},
'Phone': {0: '1-243-669-7472',
1: '155-3483',
2: '123-5058',
3: '1-637-740-7614',
4: '901-2461',
5: '265-1176',
6: '945-0713',
7: '227-9994',
8: '398-8097',
9: '791-5111',
10: '1-930-942-2322',
11: '1-238-336-4864',
12: '1-894-978-3696',
13: '220-5054',
14: '992-6968',
15: '1-960-740-2261',
16: '1-120-782-6047',
17: '425-7583',
18: '1-576-789-5730',
19: '1-387-932-2096'},
'Address': {0: '144-1225 In Road',
1: 'Ap #909-6656 Ac St.',
2: 'P.O. Box 441, 6183 Ligula St.',
3: '305-496 Morbi Rd.',
4: '7058 Dapibus St.',
5: '421-5501 Cursus. St.',
6: 'Ap #481-6631 Vehicula Rd.',
7: 'P.O. Box 575, 4033 Mi St.',
8: 'Ap #763-5990 Nec, Av.',
9: 'Ap #841-1623 Vitae Avenue',
10: '9269 Libero Ave',
11: 'P.O. Box 677, 2311 Aliquet. Road',
12: '7438 Amet, Rd.',
13: 'P.O. Box 432, 9085 Nulla Ave',
14: '1768 Magna. Road',
15: 'P.O. Box 497, 8354 Habitant St.',
16: '217-9163 Lobortis Road',
17: 'Ap #929-9420 Vivamus Rd.',
18: '910-8300 Varius Rd.',
19: '7458 Sapien. St.'},
'City': {0: 'Navsari',
1: 'Kitimat',
2: 'Casanova Elvo',
3: 'Biggleswade',
4: 'Dhuy',
5: 'Tulsa',
6: 'Pedro Aguirre Cerda',
7: 'Saint-Vincent',
8: 'Tirúa',
9: 'Hindupur',
10: 'Whitchurch',
11: 'Port Harcourt',
12: 'Algeciras',
13: 'RequÃ\xadnoa',
14: 'Moose Jaw',
15: 'Bear',
16: 'Salles',
17: 'Friedberg',
18: 'Bertiolo',
19: 'Tropea'},
'Country': {0: 'Togo',
1: 'Nauru',
2: 'Palestine, State of',
3: 'Malawi',
4: 'Qatar',
5: 'Holy See (Vatican City State)',
6: 'Bonaire, Sint Eustatius and Saba',
7: 'Kuwait',
8: 'Romania',
9: 'South Georgia and The South Sandwich Islands',
10: 'Lesotho',
11: 'Kyrgyzstan',
12: 'Anguilla',
13: 'Congo (Brazzaville)',
14: 'Uruguay',
15: 'Cayman Islands',
16: 'Eritrea',
17: 'Czech Republic',
18: 'Afghanistan',
19: 'Cambodia'},
'Email': {0: '[email protected]',
1: '[email protected]',
2: '[email protected]',
3: '[email protected]',
4: '[email protected]',
5: '[email protected]',
6: '[email protected]',
7: '[email protected]',
8: '[email protected]',
9: '[email protected]',
10: '[email protected]',
11: '[email protected]',
12: '[email protected]',
13: '[email protected]',
14: '[email protected]',
15: '[email protected]',
16: '[email protected]',
17: '[email protected]',
18: '[email protected]',
19: '[email protected]'}}
In [3]:

#loading the json data
with open('data.json', 'rb') as j:
load_json = json.load(j)
#creating the json dict with in int key value. It was of str previously.
json_dict = {}
mid_dict = {}
for i in load_json.keys():
for j in load_json[i].keys():
mid_dict[int(j)] = load_json[i][j]
json_dict[i] = mid_dict
mid_dict = {}
json_dict

Out[3]:
{'Name': {20: 'Paul Merrill',
21: 'Brynne S. Barr',
22: 'Cyrus Buckley',
23: 'Chloe Burnett',
24: 'Zachery Wilcox',
25: 'Casey Mcgowan',
26: 'Cole X. Hopper',
27: 'Tara Bender',
28: 'Malik Grimes',
29: 'Ulla Russo',
30: 'Colby Moran',
31: 'Maggy Wooten',
32: 'Cameron Guthrie',
33: 'Gail Villarreal',
34: 'Harding Salinas',
35: 'Idona W. Bonner',
36: 'Warren Castillo',
37: 'Clayton Harmon',
38: 'Alana Vasquez',
39: 'Mason R. Trujillo'},
'Phone': {20: '1-313-739-3854',
21: '939-4818',
22: '266-3123',
23: '828-0406',
24: '1-611-756-4723',
25: '1-155-558-4461',
26: '1-328-505-0545',
27: '1-757-378-4079',
28: '793-4359',
29: '662-7778',
30: '1-788-230-1991',
31: '912-7242',
32: '988-2217',
33: '1-405-823-4207',
34: '1-505-843-5401',
35: '283-6921',
36: '1-250-875-9104',
37: '1-609-380-9257',
38: '1-853-288-4269',
39: '172-5777'},
'Address': {20: '916-8087 Vehicula Rd.',
21: '878-2231 Suspendisse Rd.',
22: 'P.O. Box 572, 7680 Ullamcorper Ave',
23: '563-4105 Donec Avenue',
24: '462-2112 In Rd.',
25: '420-7327 Facilisis Street',
26: '561-7476 Eget St.',
27: '1247 Nonummy Rd.',
28: 'Ap #603-3303 Libero. St.',
29: 'P.O. Box 975, 4593 Ante. Street',
30: '3696 Augue Ave',
31: 'P.O. Box 365, 6109 Metus. Rd.',
32: 'Ap #861-8699 Non Ave',
33: '371-7266 Tortor Avenue',
34: '4167 Nunc Ave',
35: 'Ap #302-2966 Cum Av.',
36: 'Ap #275-2917 Curabitur Rd.',
37: '6930 Duis Road',
38: '1511 Lobortis Ave',
39: 'Ap #711-213 Sagittis Avenue'},
'City': {20: 'Le Mans',
21: 'Wilhelmshaven',
22: 'Sangli',
23: 'Wabamun',
24: 'Barddhaman',
25: 'Pfungstadt',
26: 'Saint John',
27: 'Avellino',
28: 'Winnipeg',
29: 'Vitória da Conquista',
30: 'Hualpén',
31: 'Kapuskasing',
32: 'Pontypridd',
33: 'Saint-Remy-Geest',
34: 'Arsimont',
35: 'Nieuwenrode',
36: 'La Baie',
37: 'College',
38: 'Richmond Hill',
39: 'Quinta Normal'},
'Country': {20: 'Somalia',
21: 'Samoa',
22: 'Taiwan',
23: 'Morocco',
24: 'Hong Kong',
25: 'Iran',
26: 'Macao',
27: 'Dominica',
28: 'Congo (Brazzaville)',
29: 'Slovakia',
30: 'France',
31: 'Indonesia',
32: 'Turks and Caicos Islands',
33: 'Marshall Islands',
34: 'Montserrat',
35: 'Faroe Islands',
36: 'Ireland',
37: 'United States',
38: 'Israel',
39: 'Sudan'},
'Email': {20: '[email protected]',
21: '[email protected]',
22: '[email protected]',
23: '[email protected]',
24: '[email protected]',
25: '[email protected]',
26: '[email protected]',
27: '[email protected]',
28: '[email protected]',
29: '[email protected]',
30: '[email protected]',
31: '[email protected]',
32: '[email protected]',
33: '[email protected]',
34: '[email protected]',
35: '[email protected]',
36: '[email protected]',
37: '[email protected]',
38: '[email protected]',
39: '[email protected]'}}
In [4]:

#Loading the pickle value
with open('data.pkl', 'rb') as p:
load_pkl = pickle.load(p)
#Creating the dict and aligninh the key values with other two
pkl_dict = {}
for i in csv_dict.keys():
pkl_dict[i] = load_pkl[i]
pkl_dict

Out[4]:
{'Name': {40: 'Garrison Lindsey',
41: 'Jenna Mercado',
42: 'Drake Savage',
43: 'Rana Z. Colon',
44: 'Melodie Knox',
45: 'Cooper T. Horton',
46: 'Eaton Nelson',
47: 'Lucian W. Lynn',
48: 'Sydney Anderson',
49: 'Jane Joyner',
50: 'Yen P. Browning',
51: 'Katell Simmons',
52: 'Freya B. Fischer',
53: 'Rama W. Mack',
54: 'Lawrence Z. Carrillo',
55: 'Quyn Serrano',
56: 'Indira L. Mccormick',
57: 'Rina W. Harris',
58: 'Cherokee George',
59: 'Michael Riddle',
60: 'Kay Rice',
61: 'Arden Leonard',
62: 'Chantale Sharpe',
63: 'Calvin Herman',
64: 'Walter R. Gaines',
65: 'Berk Finley',
66: 'Timothy Chambers',
67: 'Ariana M. Olson',
68: 'Mason E. Kelly',
69: 'Keane Stein',
70: 'Ginger Morse',
71: 'Maggy Cotton',
72: 'Talon R. May',
73: 'Devin L. Boone',
74: 'Orli E. Baxter',
75: 'Wing Velazquez',
76: 'Inez Simon',
77: 'Kyle Leonard',
78: 'Selma Christensen',
79: 'Gwendolyn Crosby',
80: 'Gary Alvarez',
81: 'Knox L. Cash',
82: 'Drake P. Guerrero',
83: 'Blossom Chandler',
84: 'Joan O. Ingram',
85: 'Buffy R. Austin',
86: 'Yoko M. Mcgowan',
87: 'Walker Q. Wolfe',
88: 'Blake Cross',
89: 'Naida Guthrie',
90: 'Yardley Singleton',
91: 'Lenore M. Boyer',
92: 'Edan Cortez',
93: 'Quintessa T. Martinez',
94: 'Reuben Skinner',
95: 'Yoshio Leblanc',
96: 'Rebecca French',
97: 'Shana K. Kerr',
98: 'Gemma Leonard',
99: 'Adara Estrada'},
'Phone': {40: '420-1477',
41: '102-2189',
42: '1-790-105-7695',
43: '486-7539',
44: '1-479-861-6093',
45: '768-1000',
46: '746-8562',
47: '1-392-783-0634',
48: '1-610-717-0447',
49: '1-131-574-3183',
50: '473-1433',
51: '1-647-852-3590',
52: '514-9914',
53: '1-849-217-6292',
54: '352-3711',
55: '1-450-807-5530',
56: '1-330-764-3846',
57: '760-1654',
58: '1-722-165-1370',
59: '476-0145',
60: '477-5481',
61: '383-6541',
62: '1-600-834-9076',
63: '1-461-665-6848',
64: '370-5831',
65: '1-765-752-4793',
66: '819-2872',
67: '447-5000',
68: '1-896-767-7525',
69: '457-2683',
70: '1-228-310-1687',
71: '1-541-405-3049',
72: '143-7688',
73: '1-132-242-8605',
74: '371-7491',
75: '354-5776',
76: '461-0691',
77: '179-3944',
78: '978-6407',
79: '692-9172',
80: '1-692-738-4449',
81: '535-9704',
82: '250-6382',
83: '142-2607',
84: '1-889-203-6592',
85: '413-3678',
86: '1-731-637-5890',
87: '1-240-595-6907',
88: '979-7498',
89: '1-138-699-9182',
90: '945-1641',
91: '513-0044',
92: '1-223-433-5209',
93: '1-672-341-8336',
94: '1-790-135-9618',
95: '1-508-613-2127',
96: '397-3408',
97: '354-7392',
98: '175-7956',
99: '1-893-111-1453'},
'Address': {40: 'P.O. Box 466, 7919 In Av.',
41: 'P.O. Box 484, 9648 Sit Avenue',
42: 'P.O. Box 254, 2688 Luctus, Street',
43: 'Ap #682-9992 Neque Rd.',
44: '245-8811 Ut St.',
45: 'P.O. Box 383, 139 A Ave',
46: '7989 Magna Rd.',
47: '7312 Tristique St.',
48: 'P.O. Box 720, 9179 Fermentum Street',
49: '200-5702 Mollis St.',
50: 'Ap #221-1593 Fringilla St.',
51: 'P.O. Box 133, 5382 Enim Ave',
52: 'Ap #869-5869 Neque Avenue',
53: '2992 Vitae Rd.',
54: '6427 Eros Avenue',
55: 'P.O. Box 133, 6862 Diam Road',
56: 'P.O. Box 679, 7373 Mollis Ave',
57: 'P.O. Box 642, 2289 Volutpat. Street',
58: '221-3908 Pellentesque Av.',
59: '581-1223 Aliquam Rd.',
60: '2398 Lectus, Road',
61: '1274 Nullam St.',
62: '1229 Nisl. Av.',
63: '263-4846 Sed St.',
64: '3247 Parturient Ave',
65: '6138 Faucibus Ave',
66: '865-2066 Vel Rd.',
67: '173-4952 Pede, Avenue',
68: '593 Turpis. Av.',
69: '567-6664 Egestas St.',
70: 'P.O. Box...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here