RBM_problem_statement.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Restricted Boltzman Machine \n",
"\n",
"A restricted Boltzmann Machine is an \"Energy Based\" generative stochastic model. They were initially invented by Paul Smolensky in 1986 and were called \"Harmonium\". After the evolution of training algorithms in the mid 2000's by Geoffrey Hinton, the boltzman machine became more prominent. They gained big popularity in recent years in the context of the Netflix Prize where RBMs achieved state of the art performance in collaborative filtering and have beaten most of the competition.\n",
"\n",
"RBM's are useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning and topic modeling. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Architecture\n",
"RBMs are shallow, two-layer neural nets that constitute the building blocks of deep-belief networks. The first layer of the RBM is called the visible, or input, layer, and the second is the hidden layer. The absense of an output layer is apparent. As we move forward we would learn why the output layer won't be needed.\n",
"\n",
"Figure1: Layers in RBM\n",
"Each circle in the figure above represents a neuron-like unit called a node, and nodes are simply where calculations take place. \n",
"
\n",
"Figure2: Structure of RBM\n",
"The nodes are connected to each other across layers, but no two nodes of the same layer are linked. That is, there is no intra-layer communication – this is the restriction in a restricted Boltzmann machine. Each node is a locus of computation that processes input, and begins by making stochastic decisions about whether to transmit that input or not. Each visible node takes a low-level feature from an item in the dataset to be learned."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start by importing the packages that will be required in this project."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from collections import Counter, defaultdict\n",
"import pandas as pd\n",
"from scipy.sparse import coo_matrix, hstack\n",
"from sklearn.feature_extraction.text import CountVectorizer\n",
"np_rng = np.random.RandomState(1234) #setting the random state"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# graded\n",
"# import data\n",
"\n",
"df = pd.read_excel('amazon.xlsx')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
"\n",
"\n",
" | \n",
"Text | \n",
"
\n",
"\n",
"\n",
"\n",
"0 | \n",
"So there is no way for me to plug it in here i... | \n",
"
\n",
"\n",
"1 | \n",
"Good case, Excellent value. | \n",
"
\n",
"\n",
"2 | \n",
"Great for the jawbone. | \n",
"
\n",
"\n",
"3 | \n",
"Tied to charger for conversations lasting more... | \n",
"
\n",
"\n",
"4 | \n",
"The mic is great. | \n",
"
\n",
"\n",
"
\n",
"
"
],
"text/plain": [
" Text\n",
"0 So there is no way for me to plug it in here i...\n",
"1 Good case, Excellent value.\n",
"2 Great for the jawbone.\n",
"3 Tied to charger for conversations lasting more...\n",
"4 The mic is great."
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#run this and check if you have got the correct output\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Topic Modelling, you find the best set of topics that describe the document. There are various ways to perform topic modelling one of which is RBM. You train your RBM on a set of documents. \n",
"The visible layers will be the words in the text, the hidden layers will give the Topics. \n",
"To input words into the visible layer, let's convert the train and test data you split above into a bag of words model."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.feature_extraction.text import CountVectorizer\n",
"#graded\n",
"# create bag of words model on train and test data\n",
"\n",
"tf = CountVectorizer() #the final shape should be (number of documents, vocabulary)\n",
"\n",
"# fit tf on the dataframe df\n",
"\n",
"# transform df dataframe\n",
"trainX = tf.fit_transform(df.Text) "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4\n"
]
},
{
"data": {
"text/plain": [
"(1000, 1847)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#check if you are getting the correct output\n",
"print(sum(trainX.toarray()[1]))\n",
"trainX.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should get:\n",
"
\n",
"\n",
"3
\n",
"(1000, 1825)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that you have the bag of words model, let's define the number of visible and hidden units."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"#graded\n",
"# define visible units\n",
"visibleUnits = trainX.shape[1] # vocabulary size ~1 line\n",
"\n",
"# assign number of units\n",
"hiddenUnits = 5 # hyperparameter, this means that we are looking for 5 topics"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1847"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"visibleUnits"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"#graded\n",
"# utility Functions\n",
"\n",
"# deine the sigmoid function\n",
"def sigmoid(X):\n",
" return 1. / (1 + numpy.exp(-X)) # ~ 1 line"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## RBM as a Probabilistic Model\n",
"Restricted Boltzmann Machines are probabilistic. As opposed to assigning discrete values the model assigns probabilities. At each point in time the RBM is in a certain state. The state refers to the values of neurons in the visible and hidden layers v and h. The probability that a certain state of v and h can be observed is given by the following joint distribution:\n",
"
\n",
"\n",
"Eq. 2. Joint Distribution for v and h.\n",
"Here Z is called the ‘partition function’ that is the summation over all possible pairs of visible and hidden vectors."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The joint distribution is known as the Boltzmann Distribution which gives the probability that a particle can be observed in the state with the energy E. Unfortunately it is very difficult to calculate the joint probability due to the huge number of possible combination of v and h in the partition function Z. Much easier is the calculation of the conditional probabilities of state h given the state v and conditional probabilities of state v given the state h:\n",
"
\n",
"\n",
"Eq. 3. Conditional probabilities for h and v.\n",
"It should be noticed beforehand (before demonstrating this fact on practical example) that each neuron in a RBM can only exist in a binary state of 0 or 1. The most interesting factor is the probability that a hidden or visible layer neuron is in the state 1 — hence activated. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Contrastive Divergence\n",
"\n",
"### Gibbs Sampling\n",
"The first part of the training is called Gibbs Sampling. Given an input vector v we are using p(h|v) for prediction of the hidden values h via sampling. Knowing the hidden values we use p(v|h) for prediction of new input values v via sampling. This process is repeated k times. After k iterations, we obtain the visible vector $v_k$ which was recreated from original input values $v_0$.\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The gibbs function *gibbs* is divided into subparts:
\n",
"1.*sampleHiddenLayer *
\n",
"2.*sampleVisibleLayer*\n",
"\n",
"Let's look at *sampleHiddenLayer* now.\n",
"\n",
"### Sample Hidden Layer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You already know that given an input vector v the probability for a single hidden neuron j being activated is:\n",
"
\n",
"\n",
"Eq. 4\n",
"Here is σ the Sigmoid function.\n",
"\n",
"*sampleHiddenLayer* takes the visible layer as input to calculate the hidden layer using Eq. 4 *h1Pdf* and then samples it to get * h1_sample*\n",
"\n",
" v_sample: given visible layer matrix; matrix because a batch of data points will be trained at one go\n",
" returns a sample vector of hidden layer and its distribution for a batch of data points\n",
" \n",
" hPdf: distribution of hidden layer; a matrix for batch of datapoints = p(h|v)\n",
" h_sample: sampled hidden layer matrix"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"#graded\n",
"def sampleHiddenLayer(v_sample):\n",
" \n",
" # write the code for calculation of hPdf using vectorized implementation of Eq 4\n",
" hPdf = sigmoid(np.dot(v_sample, W) + hiddenBias)# ~ 1 line\n",
" \n",
" # Here, np.random.binomial is used to create the hidden layer sample matrix\n",
" h_sample = np_rng.binomial(size=hPdf.shape, n=1, p=hPdf)\n",
" return [hPdf, h_sample]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Visible Layer\n",
"Similarly, the probability that a binary state of a visible neuron i is set to 1 is:\n",
"
\n",
"Eq. 5\n",
"\n",
"As seen in equation 5, we will be writing a function to sample the Visible Layer.\n",
"This function samples the visible layer based on the sampled data of hidden layer.
\n",
"\n",
"There are some differences in writing the function *sampleVisibleLayer*.
Firstly, we use np.random.multinomial to sample the visible layer *v_sample* from the distribution *vPdf*.
Secondly,elements of *vPdf* needs to sum to 1 as the function np.random.multinomial used to sample the visible layer takes on probability distributions as *pvals*. In other words, you are finding the softmax values.
Thirdly, we also make use of the *D* to sample the visible layer as each document has different word count.\n",
" \n",
" h_sample: given hidden layer matrix; matrix because a batch of data points will be trained at one go\n",
" D: array of the sum of the row of the data vector; vector containing number of words in each document\n",
" \n",
" returns a sample vector of hidden layer and its distribution for a batch of data points\n",
" \n",
" vPdf: distribution of visible layer; a matrix for batch of datapoints = p(v|h)\n",
" v_sample: sampled visible layer matrix\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"#graded\n",
"def sampleVisibleLayer(h_sample, D):\n",
" \n",
" # complete the following function such that vPdf has the sum of entries equal to 1 for each of the datapoints in the batch\n",
" # you have to use axis = 1 in writing the denominator\n",
" numerator = sigmoid(numpy.dot(h_sample, W.T) + visibleBias)# ~1 line\n",
"...