Lab 2: Boundaries 1) a) Using one of the Corpora in the last lab. Calculate the average "Tokens" per sentence. b) Using the same or different corpus, which category has the longest sentences on...

1 answer below »
attached


Lab 2: Boundaries 1) a) Using one of the Corpora in the last lab. Calculate the average "Tokens" per sentence. b) Using the same or different corpus, which category has the longest sentences on average, which has the shortest? 2) Download your own "Corpus" on https://www.gutenberg.org/ (Links to an external site.)  a) How many sentences are in the document (use NLTK to split the sentences)? How does this differ from the amount of lines in the file (readlines)? b) After tokenizing the sentences, find 3 errors and describe why you think this error might have occurred. What in the algorithm might have gone wrong?
Answered Same DayJun 03, 2021

Answer To: Lab 2: Boundaries 1) a) Using one of the Corpora in the last lab. Calculate the average "Tokens" per...

Mani answered on Jun 03 2021
152 Votes
Assignment2
Lab2 Boundaries:
Problem #1:
a. Corpora used = movies_review
import nltk
from nltk.
corpus import movie_reviews
# part a
sum_tokens = 0
sentences = movie_reviews.sents()
total_sents = len(sentences)
# sentences are already present as tokens i.e list of words
for sentence in sentences:
tokens = len(sentence)
sum_tokens = sum_tokens + tokens
average_token_per_sent = sum_tokens/total_sents
print("Average tokens per sentence: {}".format(average_token_per_sent))
b. There were 2 categories in this so...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here