{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "QxDhTyclvzuz"
},
"source": [
"## Assignment 1: MongoDB"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "JjUBX185vzu0"
},
"source": [
"This assignment is based on content discussed in Module 2: Introduction to MongoDB"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Y5sVEZYQvzu1"
},
"source": [
"## Learning outcomes "
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "RNAMrxvpvzu1"
},
"source": [
"The purpose of this assignment is for learners to be able to:\n",
"- Familarize with JSON document syntax\n",
"- Understand basic MongoDB CRUD operations\n",
"- Understand MongoDB data pipelines to run aggregate queries"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "DB6Mhqwrvzu2"
},
"source": [
"In this assignment, you will make use of the sample data provided in Module 2. \n",
"\n",
"This dataset has 3 collections: Employee, Workplace and Address. You will import this data into your local MongoDB database.\n",
"\n",
"Required imports for this project are given below. Make sure you have all libraries required for this project installed. You may use conda or pip based on your set up."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ZPXaQ47onjsR"
},
"source": [
"## Setup Notes\n",
"\n",
"**Please note** These instructions are duplicated here for your reference. You would have encountered them originally in Module 2 Part 2. It is expected that you have already taken the steps to set up and run MongoDB.\n",
"\n",
"We will be using MongoDB Community Edition. The MongoDB database
MUST be installed and running locally before continuing with this notebook. We will need to install two packages using the Anaconda package manager:\n",
"\n",
"1. [Mongodb](https://www.mongodb.com/) - this package contains the mongodb database \n",
"2. [PyMongo](http://api.mongodb.com/python/current/) - this packages contains the python driver that will allow us to communicate with the mongodb database.\n",
"\n",
"#### Install\n",
"\n",
"1. Open a command line terminal and execute the following commad to install mongodb.\n",
"```console\n",
"conda install -c anaconda mongodb\n",
"```\n",
"2. Open a command line terminal and execute the following commad to install pymongo packge.\n",
"```console\n",
"conda install -c anaconda pymongo\n",
"```\n",
"\n",
"#### Run Mongodb in Windows\n",
"1. MongoDB requires a data directory to store all data. MongoDB’s default data directory path is \\data\\db. Create this folder using the following commands from a Command Prompt:\n",
"```console\n",
"md \\data\\db\n",
"```\n",
"\n",
"2. To start MongoDB, run mongod.exe. For example, from the Command Prompt:\n",
"```console\n",
"\"C:\\Program Files\\MongoDB\\Server\\3.2\\bin\\mongod.exe\"\n",
"```\n",
"\n",
"#### Run Mongodb in Mac\n",
"1. MongoDB requires a data directory to store all data. MongoDB’s default data directory path is /data/db. Create this folder using the following commands from a Command Prompt. Note that we run the command as a super user using the \"sudo\" command:\n",
"```console\n",
"sudo mkdir /data/db/\n",
"```\n",
"\n",
"2. To start MongoDB, run mongod.exe. For example, from the Command Promp. Note that we run the command as a super user using the \"sudo\" command:\n",
"```console\n",
"sudo mongod --config /usr/local/etc/mongod.conf\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 68
},
"colab_type": "code",
"id": "7tp6SPkB37og",
"outputId": "e70de7e9-abf5-442f-a8fd-d67f6c9742b1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your Computer Name is:269f26643b4d\n",
"Your Computer IP Address is:172.28.0.2\n",
"/bin/bash: mongodb: command not found\n"
]
}
],
"source": [
"import socket \n",
"hostname = socket.gethostname() \n",
"IPAddr = socket.gethostbyname(hostname) \n",
"print(\"Your Computer Name is:\" + hostname) \n",
"print(\"Your Computer IP Address is:\" + IPAddr)\n",
"\n",
"!mongodb"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "Gqxn8n27vzu3",
"outputId": "c745506a-c1c5-401a-eff3-213bb4436a4e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mongo version 3.10.1\n"
]
}
],
"source": [
"#required imports\n",
"import os\n",
"import json\n",
"import datetime\n",
"import pymongo\n",
"import pprint\n",
"import pandas as pd\n",
"import numpy as np\n",
"from pymongo import MongoClient\n",
"print('Mongo version', pymongo.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "TeBZEImlnjsY"
},
"source": [
"We first need to connect to our locally running MongoDB database (
Make sure your database is running on your machine). We will use the MongoClient to connect to a local 'test' database that is running on port 27017 (this is the default port)."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "UdtVCmGGnjsZ"
},
"outputs": [],
"source": [
"client = MongoClient('localhost', 27017)\n",
"#client = pymongo.MongoClient(\"mongodb://testUser:
[email protected]:27017,cluster0-shard-00-01-3lk6x.mongodb.net:27017,cluster0-shard-00-02-3lk6x.mongodb.net:27017/test?ssl=true&replicaSet=Cluster0-shard-0&authSource=admin&retryWrites=true&w=majority\")\n",
"db = client.assignment1"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "2WW6r96Tvzu7"
},
"source": [
"After installing necessary modules proceed to import the data into your database."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 375
},
"colab_type": "code",
"id": "5VDnAuj6vzu7",
"outputId": "1f8e0342-4f71-46de-e5cb-eff41c1e49d3"
},
"outputs": [],
"source": [
"# Let's delete any existing collections in our database\n",
"db.workplace.drop()\n",
"db.address.drop()\n",
"db.employee.drop()\n",
"\n",
"# Import our files into our three collections\n",
"with open('Data/Employee.json') as f:\n",
" db.employee.insert_many(json.load(f))\n",
"with open('Data/Workplace.json') as f:\n",
" db.workplace.insert_many(json.load(f))\n",
"with open('Data/Address.json') as f:\n",
"...