Answered Same DayNov 16, 2021

Answer To: Coding help

Ximi answered on Nov 17 2021
154 Votes
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Homework 3 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this homework you will be performing some analysis with entity extraction. In particular, you will be looking at the Reuters corpus and trying to construct entity profiles of persons and locations. This will require you to iterate through the documents in the Reuters corpus, parse them appropriately, extract entities, and then store the entities along with some surrounding text. Additionally, you will be looking for mechanisms to identify potential relationships between persons and locations.\n",
"\n",
"Throughout this you will need to use NLTK to access the corpus. At the same time, you will need to use an entity extraction system. You can choose to use either NLTK or Spacy. I would strongly suggest using Spacy for the entity extraction portion of this assignment.\n",
"\n",
"The basic idea is to build a knowledge base around the entities you will extract in the Reuters corpus. Normally, this would be a first step to trying to model such things as entity resolution across documents. You could also use this as
a first step to analyzing the sentiment towards particular entities. For example, people expressing dissatistfaction at a restaurant or brand.\n",
"\n",
"Follow the below steps and read the comments carefully on the types of tasks your code will need to do.\n",
"\n",
"I would expect that some of you might be able to reuse parts of this code for your project..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1) Import necessary libraries "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# This will be the corpus we work from\n",
"from nltk.corpus import reuters\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# I will assume you are using Spacy as a default entity recognizer.\n",
"import spacy\n",
"# note, the model load can be odd. In some instances your model might have the full name or the short name here.\n",
"# if you run into issues here, check the spacy model page at https://spacy.io/usage/models\n",
"nlp = spacy.load(\"en\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2) FIll in the following function to extract the entity, document id, and relevant sentence text from the input"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10788\n",
"90\n"
]
},
{
"data": {
"text/plain": [
"['test/14826',\n",
" 'test/14828',\n",
" 'test/14829',\n",
" 'test/14832',\n",
" 'test/14833',\n",
" 'test/14839',\n",
" 'test/14840',\n",
" 'test/14841',\n",
" 'test/14842',\n",
" 'test/14843',\n",
" 'test/14844',\n",
" 'test/14849',\n",
" 'test/14852',\n",
" 'test/14854',\n",
" 'test/14858',\n",
" 'test/14859',\n",
" 'test/14860',\n",
" 'test/14861',\n",
" 'test/14862',\n",
" 'test/14863',\n",
" 'test/14865',\n",
" 'test/14867',\n",
" 'test/14872',\n",
" 'test/14873',\n",
" 'test/14875',\n",
" 'test/14876',\n",
" 'test/14877',\n",
" 'test/14881',\n",
" 'test/14882',\n",
" 'test/14885',\n",
" 'test/14886',\n",
" 'test/14888',\n",
" 'test/14890',\n",
" 'test/14891',\n",
" 'test/14892',\n",
" 'test/14899',\n",
" 'test/14900',\n",
" 'test/14903',\n",
" 'test/14904',\n",
" 'test/14907',\n",
" 'test/14909',\n",
" 'test/14911',\n",
" 'test/14912',\n",
" 'test/14913',\n",
" 'test/14918',\n",
" 'test/14919',\n",
" 'test/14921',\n",
" 'test/14922',\n",
" 'test/14923',\n",
" 'test/14926',\n",
" 'test/14928',\n",
" 'test/14930',\n",
" 'test/14931',\n",
" 'test/14932',\n",
" 'test/14933',\n",
" 'test/14934',\n",
" 'test/14941',\n",
" 'test/14943',\n",
" 'test/14949',\n",
" 'test/14951',\n",
" 'test/14954',\n",
" 'test/14957',\n",
" 'test/14958',\n",
" 'test/14959',\n",
" 'test/14960',\n",
" 'test/14962',\n",
" 'test/14963',\n",
" 'test/14964',\n",
" 'test/14965',\n",
" 'test/14967',\n",
" 'test/14968',\n",
" 'test/14969',\n",
" 'test/14970',\n",
" 'test/14971',\n",
" 'test/14974',\n",
" 'test/14975',\n",
" 'test/14978',\n",
" 'test/14981',\n",
" 'test/14982',\n",
" 'test/14983',\n",
" 'test/14984',\n",
" 'test/14985',\n",
" 'test/14986',\n",
" 'test/14987',\n",
" 'test/14988',\n",
" 'test/14993',\n",
" 'test/14995',\n",
" 'test/14998',\n",
" 'test/15000',\n",
" 'test/15001',\n",
" 'test/15002',\n",
" 'test/15004',\n",
" 'test/15005',\n",
" 'test/15006',\n",
" 'test/15011',\n",
" 'test/15012',\n",
" 'test/15013',\n",
" 'test/15016',\n",
" 'test/15017',\n",
" 'test/15020',\n",
" 'test/15023',\n",
" 'test/15024',\n",
" 'test/15026',\n",
" 'test/15027',\n",
" 'test/15028',\n",
" 'test/15029',\n",
" 'test/15031',\n",
" 'test/15032',\n",
" 'test/15033',\n",
" 'test/15037',\n",
" 'test/15038',\n",
" 'test/15043',\n",
" 'test/15045',\n",
" 'test/15046',\n",
" 'test/15048',\n",
" 'test/15049',\n",
" 'test/15052',\n",
" 'test/15053',\n",
" 'test/15055',\n",
" 'test/15056',\n",
" 'test/15060',\n",
" 'test/15061',\n",
" 'test/15062',\n",
" 'test/15063',\n",
" 'test/15065',\n",
" 'test/15067',\n",
" 'test/15069',\n",
" 'test/15070',\n",
" 'test/15074',\n",
" 'test/15077',\n",
" 'test/15078',\n",
" 'test/15079',\n",
" 'test/15082',\n",
" 'test/15090',\n",
" 'test/15091',\n",
" 'test/15092',\n",
" 'test/15093',\n",
" 'test/15094',\n",
" 'test/15095',\n",
" 'test/15096',\n",
" 'test/15097',\n",
" 'test/15103',\n",
" 'test/15104',\n",
" 'test/15106',\n",
" 'test/15107',\n",
" 'test/15109',\n",
" 'test/15110',\n",
" 'test/15111',\n",
" 'test/15112',\n",
" 'test/15118',\n",
" 'test/15119',\n",
" 'test/15120',\n",
" 'test/15121',\n",
" 'test/15122',\n",
" 'test/15124',\n",
" 'test/15126',\n",
" 'test/15128',\n",
" 'test/15129',\n",
" 'test/15130',\n",
" 'test/15132',\n",
" 'test/15136',\n",
" 'test/15138',\n",
" 'test/15141',\n",
" 'test/15144',\n",
" 'test/15145',\n",
" 'test/15146',\n",
" 'test/15149',\n",
" 'test/15152',\n",
" 'test/15153',\n",
" 'test/15154',\n",
" 'test/15156',\n",
" 'test/15157',\n",
" 'test/15161',\n",
" 'test/15162',\n",
" 'test/15171',\n",
" 'test/15172',\n",
" 'test/15175',\n",
" 'test/15179',\n",
" 'test/15180',\n",
" 'test/15185',\n",
" 'test/15188',\n",
" 'test/15189',\n",
" 'test/15190',\n",
" 'test/15193',\n",
" 'test/15194',\n",
" 'test/15197',\n",
" 'test/15198',\n",
" 'test/15200',\n",
" 'test/15204',\n",
" 'test/15205',\n",
" 'test/15206',\n",
" 'test/15207',\n",
" 'test/15208',\n",
" 'test/15210',\n",
" 'test/15211',\n",
" 'test/15212',\n",
" 'test/15213',\n",
" 'test/15217',\n",
" 'test/15219',\n",
" 'test/15220',\n",
" 'test/15221',\n",
" 'test/15222',\n",
" 'test/15223',\n",
" 'test/15226',\n",
" 'test/15227',\n",
" 'test/15230',\n",
" 'test/15233',\n",
" 'test/15234',\n",
" 'test/15237',\n",
" 'test/15238',\n",
" 'test/15239',\n",
" 'test/15240',\n",
" 'test/15242',\n",
" 'test/15243',\n",
" 'test/15244',\n",
" 'test/15246',\n",
" 'test/15247',\n",
" 'test/15250',\n",
" 'test/15253',\n",
" 'test/15254',\n",
" 'test/15255',\n",
" 'test/15258',\n",
" 'test/15259',\n",
" 'test/15262',\n",
" 'test/15263',\n",
" 'test/15264',\n",
" 'test/15265',\n",
" 'test/15270',\n",
" 'test/15271',\n",
" 'test/15273',\n",
" 'test/15274',\n",
" 'test/15276',\n",
" 'test/15278',\n",
" 'test/15280',\n",
" 'test/15281',\n",
" 'test/15283',\n",
" 'test/15287',\n",
" 'test/15290',\n",
" 'test/15292',\n",
" 'test/15294',\n",
" 'test/15295',\n",
" 'test/15296',\n",
" 'test/15299',\n",
" 'test/15300',\n",
" 'test/15302',\n",
" 'test/15303',\n",
" 'test/15306',\n",
" 'test/15307',\n",
" 'test/15308',\n",
" 'test/15309',\n",
" 'test/15310',\n",
" 'test/15311',\n",
" 'test/15312',\n",
" 'test/15313',\n",
" 'test/15314',\n",
" 'test/15315',\n",
" 'test/15321',\n",
" 'test/15322',\n",
" 'test/15324',\n",
" 'test/15325',\n",
" 'test/15326',\n",
" 'test/15327',\n",
" 'test/15329',\n",
" 'test/15335',\n",
" 'test/15336',\n",
" 'test/15337',\n",
" 'test/15339',\n",
" 'test/15341',\n",
" 'test/15344',\n",
" 'test/15345',\n",
" 'test/15348',\n",
" 'test/15349',\n",
" 'test/15351',\n",
" 'test/15352',\n",
" 'test/15354',\n",
" 'test/15356',\n",
" 'test/15357',\n",
" 'test/15359',\n",
" 'test/15363',\n",
" 'test/15364',\n",
" 'test/15365',\n",
" 'test/15366',\n",
" 'test/15367',\n",
" 'test/15368',\n",
" 'test/15372',\n",
" 'test/15375',\n",
" 'test/15378',\n",
" 'test/15379',\n",
" 'test/15380',\n",
" 'test/15383',\n",
" 'test/15384',\n",
" 'test/15386',\n",
" 'test/15387',\n",
" 'test/15388',\n",
" 'test/15389',\n",
" 'test/15391',\n",
" 'test/15394',\n",
" 'test/15396',\n",
" 'test/15397',\n",
" 'test/15400',\n",
" 'test/15404',\n",
" 'test/15406',\n",
" 'test/15409',\n",
" 'test/15410',\n",
" 'test/15411',\n",
" 'test/15413',\n",
" 'test/15415',\n",
" 'test/15416',\n",
" 'test/15417',\n",
" 'test/15420',\n",
" 'test/15421',\n",
" 'test/15424',\n",
" 'test/15425',\n",
" 'test/15427',\n",
" 'test/15428',\n",
" 'test/15429',\n",
" 'test/15430',\n",
" 'test/15431',\n",
" 'test/15432',\n",
" 'test/15436',\n",
" 'test/15438',\n",
" 'test/15441',\n",
" 'test/15442',\n",
" 'test/15444',\n",
" 'test/15446',\n",
" 'test/15447',\n",
" 'test/15448',\n",
" 'test/15449',\n",
" 'test/15450',\n",
" 'test/15451',\n",
" 'test/15452',\n",
" 'test/15453',\n",
" 'test/15454',\n",
" 'test/15455',\n",
" 'test/15457',\n",
" 'test/15459',\n",
" 'test/15460',\n",
" 'test/15462',\n",
" 'test/15464',\n",
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here