Browse Prior Art Database

System and Apparatus of optimizing scripts language running for web crawler

IP.com Disclosure Number: IPCOM000168318D
Original Publication Date: 2008-Mar-06
Included in the Prior Art Database: 2008-Mar-06
Document File: 3 page(s) / 134K

Publishing Venue

IBM

Abstract

We propose a system and method of optimizing scripts language running for web crawler. This system fetch out JavaScript from web page and analyzed them to isolate the content-related part and non-related part. For content-dependent JavaScript, applying JS compiler to compile into library. During the web content analysis process, other than original JavaScript, running compiled JS library to generate content. The compiled JS library will be re-used when analyzing the other web-content which uses same JavaScript

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 3

System and Apparatus of optimizing scripts language running for web crawler System and Apparatus of optimizing scripts language running for web crawlerSystem and Apparatus of optimizing scripts language running for web crawler

1
111.... BackgroundBackgroundBackground

Background :

::: What is the problem solved by your inventionWhat is the problem solved by your inventionWhat is the problem solved by your invention

What is the problem solved by your invention ?

??? Describe known solutions to this

Describe known solutions to thisDescribe known solutions to this

problem problemproblem

((((if any
if anyif any
if any).

).). What are the drawbacks of such known solutions

What are the drawbacks of such known solutionsWhat are the drawbacks of such known solutions ,

,,, or why is an additional solutionor why is an additional solutionor why is an additional solution or why is an additional solution

required requiredrequired?

??? Cite any relevant technical documents or references

Cite any relevant technical documents or referencesCite any relevant technical documents or references .

..

In Web 2.0 era, web content becomes more and more rich and interactive. The typical examples are AJAX, XXX. Based on current web architecture, a lot of enrich and interaction functions are based on script language. More specifically, it is JavaScript.

Currently, JavaScript could provide even complex computing ability. More web content are generated via JavaScript on client side today. An example is JQuery SVG. Only the text-based SVG file is delivered from server to client. The final graph is drawing and rendered by a JQuery SVG engine on client browser.

So, for those enriched web content, the traditional web crawler (such as search engine companies) can not get the content unless interpreting and executing the JavasSript in the page.

With the quickly increasing of those kind of enriched web content, the traditional web crawlers have to face the greatly increased computing burden for vast JavaScript interpreting and executing.

JavaScript is a kind of script language, the interpretation and execution of a JavaScript-based task will cost more computation resource than a C-based well-compiled task. For normal client web browser, running a JavaScript is not a problem, even the JavaScript is complex. However, for the business-oriental web crawlers, such as search engine companies, running millions and even billions of JavaScript will become a system bottleneck for the productivity of the web crawler.

2
222.... Summary of InventionSummary of InventionSummary of Invention

Summary of Invention :

::: Briefly describe the core idea of your invention

Briefly describe the core idea of your inventionBriefly describe the core idea of your invention

((((saving the details for saving the details forsaving the details for saving the details for

questions ques...