Q#1: Data Profiling Please use Ataccama One Profiling available on the website to profile a data sample provided below, summarize few key findings (major issues in the data) and what would you propose...

1 answer below »


Q#1: Data Profiling


Please use Ataccama One Profiling available on the website to profile a data sample provided below, summarize few key findings (major issues in the data) and what would you propose to address those issues and improve the data.
Design at least 5 business rules that can be used to correct the issues you found.



Ataccama One Profiling:





https://one.ataccama.com/




Please use a browser to download the sample data from





https://www.dropbox.com/s/as15rt6h888yzht/party_full.csv



Q#2: Address Validation


Imagine you get a data set from a client that contains addresses from 150 countries all around the world and your task is to verify them, the data is stored in 3 fields –
Address Line, City, ZIP code. What you also have available is address verification solution for each country, but the data set does not include the country code. Your task is to design logic that will process the data and find the country for each record, so they can be ran through a validation component. Think of the most efficient way.



Hint: Running all of those 150 address verification components against each record is not considered efficient.



Q#3: Database Design


Imagine you are asked to design a simple database for an airport. You should accommodate for:


-
Passengers, their personal and their booking information


-
Airplanes, their cargo and their destinations


- All available
destinations
and their distances.


Please create or sketch the relational model of such database structure, including the relationships and cardinalities. The model should have at least 5 entities.


Write SQL queries for the designed database which will return:


- All people on the same flight


- All planes on the same destination


- If the sum of all cargo weight on the plane is under a threshold.




Q#4: Please choose one of these two tasks below and complete it:



Algorithm Task:


Write in pseudo-code a program/algorithm that takes a number on the input and expresses all the different ways the input can be represented as the sum of 1, 3 and 4 simultaneously. Important part of the evaluation is how understandable the solution is.



Linux Task:


- Provide a list of possible commands you would use to achieve the following:


Find a file called "start.sh" on the file system, start searching from root and suppress any errors (mostly resulting from insufficient permissions to read system dirs). At the same time, the file should contain a variable called "ATA_HOME".


- What would you use to create a backup of the found file? (More than one option possible.)


- What command(s) would you use to change the variable "ATA_HOME" in the script so it points to a parent directory?


- Listing on the filesystem shows this information about the script:



`-rwxrw-r-- 1 ataccama atc 1238 Nov 12 10:07 start.sh`


Question: will users from the same group (as the owner) be able to update and execute the script?


- Provide a list of possible commands to start the script in the background and to check the log it produces, being located here: "../logs/server.log"


- In the system, there is a running process "tomcat" with pid 1451. Provide at least one command which will show you on which port it is running.

Answered 3 days AfterMar 19, 2022

Answer To: Q#1: Data Profiling Please use Ataccama One Profiling available on the website to profile a data...

Ankur answered on Mar 22 2022
108 Votes
Assignment: write pseudocode and design database
    Requirement Specification:
1. write pseudocode and
design database
    Application and Implementation:
In order to achieve the above assignment, following files have been used.
· Run profiling on csv file on aatacama
· Create a flight database
· Write few queries
Observations post profiling of data
· NULL values have been allowed. It should be checked.
· In "gender", standard should be "MALE"/"FEMALE". In many places, "M"/"F" values also have been allowed.
· In "names", to maintain a standard, first name and last name should be captured, instead of abbreviations
· In "birth_date", a valid date format should be ensured for all entries. In YYYY-MM-DD format.
· The total digits for a card number should also be validated while entering in system. Many card numbers have less than 16 digits.
Logic for address validation
Since we have address line and zipcode available, we can use any geolocation api to fetch and validate the country name, for each record.
Database...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here