This assignment will be done on jupyter notebookRead and Load the JSON files(attached) into three or four SQL Tables on jupyter notebook (name the tables with your initials at the end (MJ))
TASK 1:Considering the answers you found for the topic of games/news in the last assignment (attached), you will group by topic & age, topic & country, topic & gender as well as topic & profession and find out the pattern and breakdown for each of those groups. If you prefer, you can consider just those topics:
a.share w/media
b.share w/o media
c.fake news (t & f)
d.time spent w/ media
e.time spent w/o media
You will calculate the p-value for each one and formulate a null hypothesis. Can you find any pattern on each one of those aggregated groups?
You need to create as many regression models as possible and display well the results. There are a few different targets:
4a. Share w/ media
4b. Share w/o media
4c Fake news
4d. Share Fake news
4e. Share True news
4f. Time Spent with media (linear)
4g. Time Spent w/o media (linear)
Both 4f & 4g has combinations with fake or not fake news that can be all explored.
Plot graphs for each of the above
create SQL tables and use them in this assignment.
In this assignment, I want you to compare all the counts, sum, averages, median, correlation you have made processing the json data sets with the SQL tables, Make sure that both results are displayed in a table format, for good visualization/presentation. If your markers differ, please make sure you troubleshoot your code, because they must be the same and this is the proof you have processed the files correctly in both formats (JSON & SQL).
Make sure to comment well on your code!!