This week, we will practice with datasets, regular expressions, and dictionary operations in preparation for next week's assignment using the National Cancer Institute'sSurveillance, Epidemiology, and End Results(SEER) Database.
Using the simple regular expression for person names in the slides, the PA8_incomplete.py program replaces names from an input file by a string "**name**" and saves that in an output file. Improve that regular expression, for e.g., add optional middle name/initial, suffixes, more prefixes etc. You may use multiple regular expressions if you want. The program need not be perfect, i.e. it need not cover every possible way a name can be written, it may miss some names and may incorrectly replace something that is not a name, but it should do a reasonable job. You may assume that a name always starts with a prefix. Next, add code to also replace email addresses by the string "**email**", again it may not do a perfect job but it should do a reasonable job.
Submit your .py file in Canvas.
PA8_incomplete.py
Download PA8_incomplete.py
Here's a sample input file you can use (or use your own):
Assignment 9 input file.txt
Download Assignment 9 input file.txt
Dr. James Anderson MBBS. [email protected] Prof. Chris Nathan [email protected] Mr. Ryan Lee [email protected] Ms. Gunderson Sten [email protected] Dr. Stepehen Barnet Phd. [email protected]