Assignment 4• In motif finding, a weight matrix (also referred to as Position Weight Matrix or Position Specific Weight Matrix) is defined as the log-odds matrix whose elements are defined asW (b, j)...

1 answer below »
Assignment 4• In motif finding, a weight matrix (also referred to as Position Weight Matrix or Position Specific Weight Matrix) is defined as the log-odds matrix whose elements are defined asW (b, j) = log [ F’ (b, j) / F(b, o) ]Where b corresponds to the base and j is the index accounting for the number of bases in the motif. F’ (b, j) corresponds to the frequency with which each base occurs at a specified position and can easily be calculated from the counts matrix after adjusting for zero values (see below). F(b, o) corresponds to the background frequency with which a particular base is known to occur and can be assumed to be 0.25 for all bases at all positions in the motif.A transcription factor argR is known to bind to a motif which can be represented with the following counts matrix built from a total of 27 binding sites documented in the literature (the counts matrix is attached as a text file which should be used by your program).a | 8 12 21 9 4 2 21 21 3 10 8 5 7 25 4 2 2 25c|74162333102 0 7 0g | 3 2 1 0 1 0 1t | 9 9 4 17211313 38 2 0 154 19 20724 021 2 2 0 1 0 21 1 1 23 16 1 0Now write a script/program to compute the frequency matrix F( b, j) using the above counts matrix. Since log odds matrix is based on frequency matrix, to avoid taking logarithm of 0 in computing it, a revised F’ (b, j) can be computed by augmenting all the base counts in counts matrix by 1
thereby artificially increasing the number of sites to 31 (put another way, a pseudocount of +1 is added to each of the real counts for each base at each position, which increases the total counts at each position in the matrix to 31). Based on this notion, compute the F’ (b, j) as well in the same script/program.• Now use the weight matrix to scan and identify the binding sites in the attached set of upstream regulatory regions of genes by filtering to those with highest similarity to the PSM i.e, your program should predict and show only the top 30 scoring gene ids corresponding to these sequences. Upstream regulatory regions of genes defined as 400 bases upstream and 50 bases after the translational start site are provided in the fasta nucleotide format along with information about the gene id to which it corresponds to.
Answered Same DayApr 02, 2021

Answer To: Assignment 4• In motif finding, a weight matrix (also referred to as Position Weight Matrix or...

Kshitij answered on Apr 05 2021
138 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from bz2 import BZ2File\n",
"from
math import log\n",
"from itertools import islice"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def read_count_matrix(file):\n",
" dm = {}\n",
" with open(file) as f:\n",
" for line in f:\n",
" line = line.strip().split()\n",
" dm[line[0]] = [int(x) for x in line[2:]]\n",
" return (dm)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def generate_freq_weight_matrices(countm):\n",
" freqm = {}\n",
" weightm = {}\n",
" for k,v in countm.items():\n",
" freqm[k] = [(float(x)+1.0)/31.0 for x in v]\n",
" weightm[k] = [log(x/0.25) for x in freqm.get(k)]\n",
" return (freqm, weightm)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def sliding_window(seq, n=18):\n",
" it = iter(seq)\n",
" result = tuple(islice(it,n))\n",
" if len(result) == n:\n",
" yield result\n",
" for elem in it:\n",
" result = result[1:] + (elem,)\n",
" yield result"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def get_max_score(seq,weightm):\n",
" maxscore = None\n",
" for s in sliding_window(seq):\n",
" ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here