Browse Prior Art Database

Method for Efficient retrieval and locking of well defined sections of documents in Content Management Systems

IP.com Disclosure Number: IPCOM000174270D
Original Publication Date: 2008-Sep-05
Included in the Prior Art Database: 2008-Sep-05
Document File: 5 page(s) / 116K

Publishing Venue

IBM

Abstract

Disclosed is an efficient method to retrieve,lock and store well defined sections of a single large document in Content Management Systems, by sectionizing it using XML. The current Content Management Systems allow very huge documents to be stored and managed. These documents, when inserted as a single file (doc/pdf/mpeg/jpeg etc or of any other format) are treated as independent entities. So, for any operations to be performed on these files, (check-in, check-out, modify, delete) the complete file has to be selected for retrieval/viewing/modifiing/locking, irrespective of whether one needs to view/modify the complete file or not. In practical scenarios, at any point of time most of the users might want to view/refer only one or a couple of sections of the document, but not the complete document. As the document is treated is a single entity in content manaqgement systems, there is no mechanism to retrieve or act on a small part/section of the document. Known solutions/approaches: A document can be treated as a compound document and multiple files/documents can be linked to a parent document id. for eg, each section of the document is loaded as a part of the document and treated individually. However these parts are linked together and tied to the document object. The drawbacks of this approach would be : ->though document is a single entity, each section of the document needs to be stored and treated as a separate entity. -> if the document has too many sections, it would not be a good idea to store each section of the document seperately. -> If each section again has multiple sub-sections, the linking could get complex. ->since each section is stored a independent entity, the meta data could also get duplicated at times. (though, through efficient programming it could be avoided) ->Multiple people would not be able to checkout and work on different sections of the document simultaneously. (Lets say, the book has four different authors/editors. If the first author has checked-out the book from the CM system, the second author has to wait to check-out and work on the book, until the first author completes his work and check-in the book)

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 5

Method for Efficient retrieval and locking of well defined sections of documents in Content Management Systems

Inventors - Palasamudram N Praveena IBM

Lets assume the book "Application programming guide", which is in pdf format to be inserted into a Content managemt system. The pdf has 18 chapters and appendix. Most of the times, the users/readers would not require the complete document, but only a particular section of the book. While retrieving the book for reading, the user is provided a dropdown list to choose whether he needs the entire book or only a particular section(chapter). Depending upon the users' choice the selected section of the book is retrieved and presented to the user. This way the retrieval would be faster and also the user is presented only the required section of the document.

Described below are brief steps of implementing the idea.

Sectional retrieval
1) Identify the sections ( these could be chapter names, section names etc)
2) create XML tags for each of the sections identified
3) store these XML tags as part of meta data
4) Each XML tag will represent one specific section of document.
5) When the user requests to retrieve the document, the system retrieves the list of xml tags, stored as metadata and and lets the user choose a specific section (XML tag)
6) Based on the XML tag selected retrieve the section of the document pointed by XM tag.

Automatic identification of sections :


PDF, MSWORD, WORDPERFECT, LOTUS etc documents provide a standard format from which sections can be identified through scanning of the document.

From these sections XML tags can be framed which can be stored as part of metadata. These XML tags are used for identifying the sections.

Sectional Retrieval:


This method allows sectional retrieval of document by sectionizing them using XML tags. Thus allowing the user to retrieve just portion of the document that is required.

Sectional locking:


This process enables sectional locking of a document, instead of locking the complete document. The check-out operations are also more efficient, as thi...