1.Preparing Text: For this part, you will start by reading the Income.json file into a DataFrame.
a.Convert all text to lowercase letters.
b.Remove all punctuation from the text.
c.Remove stop words.
d.Apply NLTK’s PorterStemmer.
2.Use a Tf-idf vector instead of the word frequency vector.
3.Complete the 5.3 Encoding Dictionaries of Features examples. Be sure to keeping track of how many times a word is used in a document, also be sure to run the sample codes 6.9 . Finally, consider tokenizing words or sentences (see 6.4) and tagging parts of speech (see 6.7) Be sure to review how to encode days of the week (see 7.6).
4.You can start with the #1 program and add to it or you can start a new program. Provide me with an example (besides counting words in a document) of how these techniques could be used. (Just a couple sentences.)
5.Then implement at least 3 of these Text techniques in a program demonstrating how your example could be accomplished. Be sure to include lots of comments.
6.Create a datafile file or use one from resources file. You must use DataFrames!
The completed task must be
in Jupyter Notebook and return with completed datafileHandling Categorical Data, Text, Dates & Times Use the data file “Income.json” File 1. Preparing Text: For this part, you will start by reading the Income.json file into a DataFrame. a. Convert all text to lowercase letters. b. Remove all punctuation from the text. c. Remove stop words. d. Apply NLTK’s PorterStemmer. 2. Use a Tf-idf vector instead of the word frequency vector. 3. Complete the 5.3 Encoding Dictionaries of Features examples. Be sure to keeping track of how many times a word is used in a document, also be sure to run the sample codes 6.9 . Finally, consider tokenizing words or sentences (see 6.4) and tagging parts of speech (see 6.7) Be sure to review how to encode days of the week (see 7.6). 4. You can start with the #1 program and add to it or you can start a new program. Provide me with an example (besides counting words in a document) of how these techniques could be used. (Just a couple sentences.) 5. Then implement at least 3 of these Text techniques in a program demonstrating how your example could be accomplished. Be sure to include lots of comments. 6. Create a datafile file or use one from resources file. You must use DataFrames! 7. The completed task must be in Jupyter Notebook and return with completed datafile. {"# of kids":{"0":5,"1":5,"2":2,"3":2,"4":0,"5":1,"6":1,"7":3,"8":3,"9":3},"Income":{"0":25000,"1":122500,"2":142007,"3":42007,"4":14704,"5":200704,"6":120070,"7":207040,"8":48000,"9":79000},"State":{"0":"CA","1":"NY","2":"TX","3":"TX","4":"TX","5":"TX","6":"CA","7":"NY","8":"NY","9":"NY"}}