The Prior Art Database and Publishing service will be updated on Sunday, February 25th, from 1-3pm ET. You may experience brief service interruptions during that time.
Browse Prior Art Database

A search engine, search webservice or source code indexing system based on simple compression technology

IP.com Disclosure Number: IPCOM000012140D
Original Publication Date: 2003-Apr-11
Included in the Prior Art Database: 2003-Apr-11
Document File: 2 page(s) / 56K

Publishing Venue



A program idea is disclosed utilising the "character / word / sentence" style index built into compression algorithms like ZIP / LZHUF to create a text indexing system. Indexed files could be text, program source code or html. Index files & source material will always be in sync. An additional or updated file added to the main repository archive file automatically & quickly updates the index.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

  A search engine, search webservice or source code indexing system based on simple compression technology

Concerns are often raised about the size of source code database indexes, often coupled with a need for an offline search facility for numerous source tree builds.

Source code can be browsed in a number of ways
1) searched using utilities like find and grep
2) searched using an IDE (integrated development environment) .
3) searched using an index on a back end database.
4) Using the UNIX* vi editor and ctags command line utility. Options 1 and 2 are very flexible for the software developer in fast moving code base environments but have the disadvantage of speed to search in large projects.

    Option 3 is very good at fast searching of base lined code that doesn't change much but has the following disadvantages: requires an online connection to the database that may be stored on a server;

the index quickly becomes out of date or out of sync with the code being used;

takes large amounts of time to create the index;

takes up disk resource for the database; and

uses CPU resource of the Server on a database query.

    Option 4 is language (i.e. 'c' only) editor (i.e. vi / emacs) and machine specific and may not allow comments to be searched.

    Also, tags files can clutter source tree, become outdated. A search engine, source code indexing system based on simple compression technology is proposed.

    A lightweight single compressed file could contain the source code (Ada/C/C++/Java/Lisp etc) and markup text language (HTML/XML) files and a full index.

    A "character / word / sentence" style index could be built into the ZIP file format / LZHUF compression algorithms to create a source indexing system.

    Benefits include: Using ZIP as a file format has the advantage source trees are usually in zip file format as a matter or course so n...