Browse Prior Art Database

A method and system for enabling search in fragmented files stored across different nodes in a peer-to-peer network

IP.com Disclosure Number: IPCOM000030588D
Original Publication Date: 2004-Aug-18
Included in the Prior Art Database: 2004-Aug-18
Document File: 4 page(s) / 108K

Publishing Venue

IBM

Abstract

This invention describes the method and system that enables search in large fragmented files stored across different nodes in a P2P network.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 37% of the total text.

Page 1 of 4

A method and system for enabling search in fragmented files stored across different nodes in a peer-to-peer network

To provide searching capability in a P2P network, a distributed search engine creates local search indices for each group of nodes (see http://www.searchtools.com/info/peer-to-peer.html), performs multiple local searches, and collates search results. If there are large documents or data files that are stored in multiple fragments scattered across several nodes in a P2P network, different fragments of the same file may come to different local search indices. In this case, the collated search results become invalid, because the distributed search engine cannot merge multiple entries related to different fragments of the same file in a correct way.

File FS1

FS2

FS3

FS4

FS5

FS6

Figure 1: File is cut into File Segments and stored at devices

Store FS1

Store FS6

FS1

FS2

FS3

FS4

FS5

FS6

Distributes Catalog of Shared Documents and Fragments:

Init Area:

Defines the virtual volume across peers Identifies every peer that is part of the environment Defines where other copies of this Init area are located across all peers Maintains version of init area

Directory Area:

Directory structure of all files across peers, including those that are fragmented across peers Includes information about files and directories, including version of file, status, date, owner, etc.

Mapping Area:

Area that defines where every fragment of a file resides within the environment

Index Meta Data Area:

Area that contains information on index location for fragmented files and syncronization field (all for variant B)

1

[This page contains 1 picture or other non-text object]

Page 2 of 4

Peer

File system

Peer

File system

Init area

Mapping

Directory Area

Area

Index Metadata

Fig 3. Distributed Catalog of Share Documents

Creating Distributed Textual Index:

The first step in organizing full text search is creating local textual indices of shared documents and fragments in every node of P2P network. We are utilizing standard textual indexing technique (see, for example, [1]) for all shared documents, except for the pieces of fragmented documents. The following 2 alternative variants of textual indexing may be used for the pieces of fragmented documents:

A. Separate local indexing -

In this case, each piece is indexed as an individual file in it's local node.
B. Combined shared indexing -

In this case, combined pieces of a fragmented document are indexed in only one node. The index information about the whole document is saved in only one local index and shared among all the nodes containg other pieces of the same document.

This process is associated with following issues:

Due to the resource limitations the whole file could not be loaded into one node for indexing - usage of existing stream based indexing technique (see [2]) allows to address the issue. Synchronization is needed to prevent duplication of indexing process in each node containing pieces of the fragmen...