Q#1: Data Profiling
Please use Ataccama One Profiling available on the website to profile a data sample provided below, summarize few key findings (major issues in the data) and what would you propose to address those issues and improve the data.
Design at least 5 business rules that can be used to correct the issues you found.
Ataccama One Profiling:
https://one.ataccama.com/
Please use a browser to download the sample data from
https://www.dropbox.com/s/as15rt6h888yzht/party_full.csv
Q#2: Address Validation
Imagine you get a data set from a client that contains addresses from 150 countries all around the world and your task is to verify them, the data is stored in 3 fields –
Address Line, City, ZIP code. What you also have available is address verification solution for each country, but the data set does not include the country code. Your task is to design logic that will process the data and find the country for each record, so they can be ran through a validation component. Think of the most efficient way.
Hint: Running all of those 150 address verification components against each record is not considered efficient.
Q#3: Database Design
Imagine you are asked to design a simple database for an airport. You should accommodate for:
-
Passengers, their personal and their booking information
-
Airplanes, their cargo and their destinations
- All available
destinations
and their distances.
Please create or sketch the relational model of such database structure, including the relationships and cardinalities. The model should have at least 5 entities.
Write SQL queries for the designed database which will return:
- All people on the same flight
- All planes on the same destination
- If the sum of all cargo weight on the plane is under a threshold.
Q#4: Please choose one of these two tasks below and complete it:
Algorithm Task:
Write in pseudo-code a program/algorithm that takes a number on the input and expresses all the different ways the input can be represented as the sum of 1, 3 and 4 simultaneously. Important part of the evaluation is how understandable the solution is.
Linux Task:
- Provide a list of possible commands you would use to achieve the following:
Find a file called "start.sh" on the file system, start searching from root and suppress any errors (mostly resulting from insufficient permissions to read system dirs). At the same time, the file should contain a variable called "ATA_HOME".
- What would you use to create a backup of the found file? (More than one option possible.)
- What command(s) would you use to change the variable "ATA_HOME" in the script so it points to a parent directory?
- Listing on the filesystem shows this information about the script:
`-rwxrw-r-- 1 ataccama atc 1238 Nov 12 10:07 start.sh`
Question: will users from the same group (as the owner) be able to update and execute the script?
- Provide a list of possible commands to start the script in the background and to check the log it produces, being located here: "../logs/server.log"
- In the system, there is a running process "tomcat" with pid 1451. Provide at least one command which will show you on which port it is running.