Browse Prior Art Database

[AV100] Method for Intelligent Ranking of Webpages in Filtering of Content

IP.com Disclosure Number: IPCOM000201506D
Original Publication Date: 2010-Nov-12
Included in the Prior Art Database: 2010-Nov-12
Document File: 4 page(s) / 281K

Publishing Venue

Linux Defenders

Related People

Daniel Miller, University of North Carolina at Asheville: AUTHOR

Abstract

Parental control software, and more broadly, content filtering software, is widely used to prevent users from accessing various content available on the internet. Currently, the majority of content filtering software relies on either a blacklist method or keyword filtering to determine whether content is suitable for viewing or not. Web filtering must be efficient in blocking content while not blocking false positives, and should not require constant manual updating by an administrator to remain efficient. This invention accomplishes these objectives by using a ranking system that bases the risk of a webpage off of keywords and the context around those keywords.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 48% of the total text.

Page 01 of 4

Method for Intelligent Ranking of Webpages in Filtering of Content

Parental control software, and more broadly, content filtering software, is widely used to prevent users from accessing various content available on the internet. This is done by determining whether content is suitable or not based on one or more criteria, and typically occurs without constant management by an administrator. It is desired to prevent access to unwanted content while not curtailing access to content that is acceptable for viewing.

Currently, the majority of content filtering software relies on either a blacklist method or keyword filtering to determine whether content is suitable for viewing or not. The blacklist method compares the URL of each website visited to a list of URLs that have been predetermined to not be suitable for viewing. For example, a parent may wish to prevent a child from accessing websites that allow them to stream video. When the child attempts to browse a video sharing site, they will find that that site has been blacklisted, and is blocked. Keyword blocking differs in that the content of webpages is reviewed; users could be able to view one page on a site, but not another based on the content contained in that page. If a parent wanted to prevent a child from using a social networking site, they could choose to block content using the keywords 'social networking', which would effective block any webpage that contained the words 'social networking'.

Both of these methods are inefficient and easy to circumvent for several reasons. Blacklisting websites does not actively prevent users from viewing content, it instead relies on sites being blocked after they have been determined to be unsuitable, which means the party that is conducting the web filtering is constantly trying to catch up with the many websites that are discovered and created daily. Keyword filtering is easily bypassed by the very nature of human languages; there are several different words and euphemisms that can be substituted for other words that prevent this kind of blocking from being effective. Additionally, keyword blocking can very easily block web pages that are actually suitable for viewing due to the dual nature of several words. Both methods can also very easily be bypassed by a web proxy that fetches web pages on its own server, and then displays them for the user, effectively bypassing the block.

Detailed Description of Invention:

Web filtering must be efficient in blocking content while not blocking false positives, and should not require constant manual updating by an administrator to remain efficient. This invention accomplishes these objectives by using a ranking system that bases the risk of a webpage from keywords and the context around those keywords. For example, a page containing the word 'drug' would have a lower risk ranking than a page that contains the words 'drug', 'buy', and 'buzz' in the same context. The first page could be a news repo...