Browse Prior Art Database

Mapping external hyperlinks to internal references within a PDF while creating the PDF from webpages Disclosure Number: IPCOM000218788D
Publication Date: 2012-Jun-07
Document File: 4 page(s) / 168K

Publishing Venue

The Prior Art Database


1) Let us assume webpages W1, W2, W3 of some website having hyperlinks hw1, hw2, and hw3 where in hw1 points to W1 and so on. 2) Now creating the PDF from the webpages described above would create pages having a bookmark corresponding to each webpage bw1, bw2 and bw3. 3) Now let’s assume that any PDF page created from webpage contain certain hyperlinks (in this case hw1, hw2 & hw3) 4) In traditional or existing method, hw1 in PDF page would points to W1 , hw2 in PDF page would points to W2 and so on. 5) As per this new invention, all hyperlinks should be stored in a separate table at the time of creation of PDF document. 6) This way, we will have table with hyperlinks hw1, hw2, hw3 for any page say P1, P2, P3 and so on. 7) Also, whenever we are creating a PDF from webpage containing webpage hyperlink say hw1 then a mapping reference R1 is also created which indicates the location of the content with in the PDF document. 8) Similarly as soon as webpage W2 is parsed and is getting converted into a PDF document using hyperlink hw2 then a mapping reference R2 is created and stored in the second table. 9) This way whenever any page is opened say P1 it contains hyperlinks like hw1, hw2, hw3 which in turn contains two maps which point to w1, w2, w3 (which is the existing solution) and also R1, R2, R3 (new solution). 10) Using these two references, user would have the option to either access the webpage reference or the PDF doc reference.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 4


Mapping external hyperlinks to internal references within a PDF while creating the PDF from webpages.


Atul Agarwal

Vaibhav Tyagi


The current day PDF document created from webpages by specifying the servername and paths (Refer Figure 1) does not contain the reference for content in a document level bookmarking operation. The reference for content/links still points towards external webpage and not within the PDF document even though the entire site has been dumped into a PDF document.

Figure 1: Options available in Acrobat to create a PDF dump of an entire site.

Page 02 of 4

Problem and Prior Solutions

Figure2: Current day problem and proposed solution

Follow the steps to understand the problem:

• Open Acrobat

• From the tool bar menu, select "Create-->PDF from web pages"

• Crate PDF from Web Page dialog box will appear.

• Click on Capture Multiple Levels

• Now specify any site URL for which you would like to create PDF.

• Choose the no. of levels for webpage up to which you would like to extract the data or choose the entire site for extracting the entire site.

• Optionally you may choose "Stay on same path" and "Stay on same server" based on your needs.

• Now start creating the document by clicking "Create".

• Acrobat starts establishing the connection to the webpage and start creating the PDF document.

• Once the PDF document is ready, you will notice that bookmarks have also been created based on the content parsed through the page.

Page 03 of 4

Problem: Now, the expectation of any user would be that bookmark available in the PDF document should take them to the content within the PDF it is referring to and not to an external webpage which is the problem. Otherwise, there is no importance of creating bookmarks for such cases. There are no existing solutions for such cases. Although, when we create PDF documents from documents containing bookmark e.g. Word doc. then bookmarks are successfully created and takes them to appropriate reference. However, reference in word docs are passed on as is in PDFs as well so it would not come under the same purview. Basically, we need to find a solution which will map the reference (in this case href) to destination within the PDF document.

New Solution

The concept behind the new s...