[ Introduction| Servlet Behavior| Database| Program Flow ]
Introduction
Surfing the web is a lot of fun. However, sometimes it is frustrating when someone is a professional in a specific field and uses some fancy jargon. It is always hard for the layman to understand what the author is talking about. It takes too much effort to surf the web again and look for other web pages that might have the explanations. If you are lucky enough, you may find them in a few searches, otherwise...
Our servlet, WEBster, provides a solution for this. It links the technical words or phases in a web page to sites that provide the explanations and references of the words or phases. The user connects to WEBsters HTML page and fills in the URL that he is interested. After the user gives this information, his browser will receive the URL document but with all the technical words being linked. By clicking on any linked word he doesn't understand, the browser will automatically go to a site that provides explanation or relevant information of that word. Thus, an explanation is just one click away.
[
Introduction| Servlet Behavior| Database| Program Flow ]Servlet Behavior
Program Flow
|
1. WEBster takes in input through an HTML form. The form lets a user enter a URL as well as specify categories of web pages that he/she is interested in linking. This information will be submitted to WEBster and WEBster will then pass this information to the responsible components. The URL information will be passed to RETRIEVER and the category information will be passed to DICTIONARY (both components will be explained in detail later).
2. RETRIEVER receives the URL information from WEBster. First, it will check whether the site with the URL exists or not. If the site does not exist, it will report so to WEBster. If the site exists, it will retrieve the requested document and save it into a temporary file. RETRIEVER then checks the MIME type of the file and determines if the file is of correct type for linking. Only one such MIME type exists, namely "text/html". If the document is appropriate for processing, then RETRIEVER will return the temporary file handle to WEBster.
3. WEBster passes the file handle to PARSER. PARSER reads the file and tries to tokenize the document. It scans the document and divides the content into elements being either HTML tags or non-HTML-tags (text body). PARSER will also give each element a flag, so that later TRANSFORMER can perform the appropriate processing on the elements. Finally, PARSER puts these elements into a linked list and returns the list to WEBster.
Below is an example of what PARSER does:<HTML>
<HEAD>
<TITLE>Frog</TITLE>
</HEAD>
<BODY>
<H2>Frog</H2>
<IMG SRC="frog.gif">
<P>
A boy frog telephones the Psychic Hotline and his Personal Psychic Advisor tells him: "You are going to meet a beautiful young girl who will want to know everything about you."
<P>
The frog is thrilled, "This is great! Will I meet her at a party?"
<P>
"No," says his Advisor, "in her biology class."
<P>
<A HREF="jokes.html">More Jokes</A>
</BODY>
</HTML>
PARSER transforms the document into a list as follows:
<HTML>\n
® <HEAD>\n ® <TITLE> ® Frog ® </TITLE>\n ® </HEAD>\n ® <BODY>\n ® <H2> ® Frog ® </H2>\n ® <IMG SRC="frog.gif"> ® <P>\n ® A boy frog telephones the Psychic Hotline and his Personal Psychic Advisor tells him: "You are going to meet a beautiful young girl who will want to know everything about you."\n ® <p>\n ® ... ® <A HREF="jokes.html">More Jokes</A>\n ® </BODY>\n ® </HTML>\n4. TRANSFORMER receives a linked list from WEBster and looks at the flag of each element on the list. It then passes the elements correspondingly to TEXT, TAGGER. If an element is not an HTML tag, like "Frog", then the element will be passed to TEXT for processing. If an element is an HTML tag, such as <IMG SRC="frog.gif">, then the element will be passed to TAGGER. TAGGER will determine if the element needs to be processed further. If so, LINKER will get the element and make appropriate modifications. If not, the element will remain unchanged. After processing the entire list, the modified list is returned to WEBster.
5. TAGGER receives an HTML tag from TRANSFORMER. The tag is then checked to see whether it needs to be processed or not. For example, tags such as, <H2>, </H2>, <PRE>, do not need processing. On the other hand, tags like <A HREF="...">, <IMG SRC="..." ...>, must be processed further. If this happens, the HTML tag will be passed to LINKER for processing, and the result will then be returned back to TRANSFORMER.
6. LINKER receives an HTML tag from TAGGER, converts relative links to absolute links, and re-writes the URL to point to WEBster. Using the above example, if the document comes from www.jokes.com, then
<IMG SRC="frog.gif">
<A HREF="jokes.html">
will become
<IMG SRC="http://www.jokes.com/frog.gif">
<A HREF="http://www.webster.com/?URL=www.jokes.com/jokes.html& ">7. TEXT receives a string from TRANSFORMER that may consist of multiple words. The string is first tokenized into a list of single words. Then TEXT passes each word to DICTIONARY and DICTIONARY looks up the word in the database. If there is no matching word, then no link will be returned from DICTIONARY, and TEXT will pass the next word to DICTIONARY. If there is a match in the database, a link will be returned by DICTIONARY. TEXT must then insert HTML anchor tags around the word. For example, if the word "frog" exists in the database with the corresponding link being www.dictionary.com/frog.html, then the word will be replaced by
<A HREF="JavaScript:Open(http://www.dictionary.com/frog.html)">frog</A>
which uses a JAVA script that pops up a small window.8. DICTIONARY contains the interface to the database, which will be implemented using JDBC. It can be given one word at a time and does a lookup for that word in a particular category. If the word exists in the database, DICTIONARY returns the link associated with the word. DICTIONARY can also add/delete word-link pairs to/from database.
9. OUTPUT receives the modified token list from WEBSTER and sends the list to CLIENT. It also adds a link at the end of the token list so that a user can choose to get back the original document.
10. UPDATE allows administrators to update/modify the database. The administrator adds a word-link pair by entering the word, the link to the definition of the word, and the category that the word belongs to. UPDATE checks the syntax of the URL and the validity of the site by passing the URL to URLCHECKER. It also checks whether there is an entry for the word in the database. If so, the administrator has to delete the entry first before adding the new entry.
11. URLCHECKER checks the syntax and validity of the URL provided by the UPDATE.
[
Introduction| Servlet Behavior| Database| Program Flow ]Database
The database stores tables of words, the categories of the words and the definition link of the words. There is a front end for the administrators to update and modify the database. Only people with a valid username and password are able to modify the database. WEBster keeps track of the number of currently processing requests on the servlet. When an administrator requests to make a modification of the database, the servlet will lock all incoming requests and wait until all the currently processing requests are finished. Then the servlet will handle the administrator request. When the administrator is finished modifying the database, the servlet will unlock the user services and starts processing user requests again. This is done to maintain consistency while processing a document, since without this locking mechanism, it is possible that a link might get deleted while the servlet is in the middle of processing a document.
There are two main tables in the database.
1. Category table. Each entity in the table has two attributes, namely Category Key and Category Name. Category Key serves as the primary key of the table. The following is an example of such table:
Category Key Category Name 1 General 2 Computer 3 Physics 4 Sports 5 Entertainment ... ... 2. Link table. Each entity in the table has three attributes. They are the Category Key, the Word that user wants to look up, and the definition link of the word. The primary key of this table is composed of the Category Key and the Word. Below is an example of such table:
Category Key Word Link 1 agent www.whatis.com/agent.html ... ... ... 2 agent www.dictionary.com/agent.html ... ... ... 2 servlet www.servlet.com/index.html ... ... ...
[
Introduction| Servlet Behavior| Database| Program Flow ]Program Flow and Java Class
Below are descriptions of the major classes in the code:
1. WEBster
Input: URL, category.
Process: Acts as central process.
2. RETRIEVER
Input: Requested URL.
Process: Checks and retrieves document, and saves document as temporary file.
Output: Hostname of requested URL and file handle.
3. PARSER
Input: File handle.
Process: Tokenize document.
Output: Link list.
4. TRANSFORMER
Input: Link list.
Process: Perform appropriate modification to linked list.
Output: Modified linked list.
5. TAGGER
Input: HTML tag.
Process: Checks if tag needs to be process further.
Output: Correct tag.
6. LINKER
Input: HTML tag.
Process: Process the tag, point to servlet.
Output: Modified HTML tag.
7. TEXT
Input: Text body.
Process: Use DICTIONARY to look for definition link of text.
Output: Linked text body.
8. DICTIONARY
Input: (category, word, link) for modification or word for lookup
Process: Interface to JDBC.
Output: Update database or a link to the word.
9. OUTPUT
Input: Modified linked list.
Process: Set appropriate header fields, print headers and linked list to user.
Output: Result to client.
10. UPDATE
Input: (word, category, link), action.
Process: Update database.
Output: Confirmation of action.
11. URLCHECKER
Input: URL input by administrator.
Process: Check syntax and validity.
Output: Status.
Class Structure
|
[