For this Assignment, you are asked to write a python program for indexing and searching keywords in multiple websites.  Your program should use all the material covered in the class (databases,...

For this Assignment, you are asked to write a python program for indexing and searching keywords in multiple websites.Your program should use all the material covered in the class (databases, network programming, graphics etc.)

Specifically, make sure your program answer the following requirements:



Part 1 - Creating / opening a database:



  1. Ask the users to input a name of database



  1. Check if the database exists, if it can be opened and if it can be read

  2. If the database does not exist, then:


Create a new database - use input for the file name (the name entered by the users)


Create the following tables and fields:





      1. Table “webpages” with the fields “Id” (a unique id number for each webpage stored), “url” (the url XXXXXXXXXXaddress of the page), “title” (the page’s title)







      1. Table “keywords” with the fields “page_id” (the id number from the previous table), “word” (all the XXXXXXXXXXunique words that appear in the page), “count” (the frequency of each XXXXXXXXXXword in the page). “significance” (the significance score for each word, XXXXXXXXXXcalculated as the word frequency count divided by the total number of XXXXXXXXXXwords on the page)




Output the total number of values in each table (if it’s a new database, the values will be “0”).



Step 2 - scraping websites


Ask the user to enter a url address, or “0” to stop


Check if the url address can be opened. If not, display an error message and ask the users to enter a new url. Otherwise, use BeautifulSoup to grab (a) the webpage title (b) All the visible text on a webpage.


Output the following:


The page title


The total number of words on the page


The total number ofkeywords / unique words on the page (Not case sensitive)


The five most significant keywords on the page (significance = word count on page / total number of words on page)



Step 3 - indexing websites


Check if the url is stored in the database (use the same database from step 1).


If the url isnotstored, then add the following values:


Table 1 - unique ID, page url and page title


Table 2 - relevant page ID, all the unique words (keywords), their word count (frequency in text), their page significance (word count / total number of words)


If the urlis stored, then:


Delete all the values associated with the url (title, keywords, etc.)


Output how many values were deleted


Add new values to the database (previous step)



Repeat parts 2 and 3 (steps XXXXXXXXXXuntil the users enter “0”. Note: run it at least 5 times with different url addresses to make sure it works correctly.



part 4 - searching keywords


Once the users enter “0” to stop indexing url addresses, display a graphic user interface (GUI) with (a) your name (b) The objective of the program (c) An input field to enter a keyword (can be numbers, letters, or both. NOT case sensitive) (d) A button to execute


Once the users enter a keyword and click the button, advance to the next step (Note: do not advance if the users didn’t enter any input!)


Ina new graphic windowor in the same GUI, display:


How many times the word appears in total (sum of all counts for all pages in the database)?


In how many indexed web pages the keyword appears?


display all the indexed pages that contain the keyword - page url, page title, and page significance score for the keyword


Allow the users to return to the previous GUI / search more keywords (as many times as they want)



Remember:


Only use the material covered in the course!


Make sure to include comments that explain all your steps (starts with #). Also use a comment to sign your name at the beginning of the program!


Work individually and only submit original work


Run the program a few times to make sure it executes and meets all the requirements


Submit a .py file!



May 18, 2022
SOLUTION.PDF

Get Answer To This Question

Submit New Assignment

Copy and Paste Your Assignment Here