You are required to carry out some data processing in Python using pandas on the data set provided and to present your submission as a single runnable Python file.
It includes Date of Sale, Price and Address of all residential properties purchased in Ireland since
the 1st January 2010 up to September 2017, as declared to the Revenue Commissioners for
stamp duty purposes.
The dataset can be read into a pandas DataFrame using the following code:
df = pd.read_csv("FinalProjectData.csv",index_col=0).fillna(value = '')
Specification
Any output to the console should be understandable to someone without any prior knowledge of the program.
All methods below should be called from a main()method.
No output figures should be hardcoded, they should all be calculated by the program.
Part 1 Viewing & Inspecting data
Use pandas methods to output the following in a single method inspecting_data() Hint: it is possible to do each section in one line of code before outputting
I. The number of rows and columns.
II. The two most common eircodes.
III. The top 10 dates properties were registered on.
IV. The top 10 counties with the total properties registered per county
V. The bottom 5 counties with the total properties registered per county
VI. The percentage of houses registered at not full market price and at full market price. Don’t worry about rounding precision down.
VII. The average price across the whole dataset (in format €300,000.00)
Part 2 Manipulating the data
I. Create a method average_house_price() which use pandas methods to output the average price grouped by county(in format €300,000.00)for:
a.all houses in descending order
b.new houses in descending order
c.second hand houses in descending order
II. Create a method total_houses(year,county)which uses pandas methods to return the total houses registered for a given county and year. Call the method for years 2015, 2016 and counties Westmeath, Limerick and Kerry
III. Each eircode consists of seven letters and/or digits, in the format A65 B2CD. The first three characters is called the routing key and represent one of 139 geographical district or post-towns.
IV. Create a method find_routing_keys() which uses pandas methods to return a dictionary of routing keys in the dataset and their corresponding county.
Create a method split_routing_key_by_size(routing_key) which returns a pandas series of total registrations grouped by property size description. Call the method for routing key N37.
Hints
1.Loading the CSV file in excel or using the Variable explorer debugger may help you.
2.Another option is to test with a smaller dataset initially, e.g. make a copy of the CSV file with a smaller subset of rows, say 50.
Programming style
This project is to be written using good programming style. As for previous assessments,the Pep8 guidelines are recommended for best practice.
Elements of good programming style include;
Meaningful variable names
Proper indentation of code
Blank lines between code sections
Use of methods
In-Line commenting
Header comment for the file
Header comments for each method
Please remove any print statements or unnecessary computations that you added to help debug errors before you submit your program
Don't just leave unnecessary code commented out ; it makes the real code harder to read
Avoid global variables inside of functions; if you use a variable in a function, you should always pass it as a parameter unless it is only for local use