Browse Prior Art Database

Automatic Correction of Mangled Hyperlinks and Other Document Corruptions, with Optional Prompting

IP.com Disclosure Number: IPCOM000016023D
Original Publication Date: 2002-Sep-15
Included in the Prior Art Database: 2003-Jun-21

Publishing Venue

IBM

Abstract

This invention pertains to automated correction of the most common forms of corruption in markup language documents, to improve the user’s online experience. More particularly, it provides automated correction, with optional user prompting, of mangled URLs, block text with insertion markers, formatted columnar plain text, and so forth, in HTML that got corrupted in the conversion from rich text to plain text and/or from plain text to rich text. There are a variety of ways in which a URL passed from one user to another in the data payload portion of email, quoted replies to email, newsgroup postings, web forums and so forth can become mangled and therefore unusable without an exacting series of manual operations too tedious for most users. The problem is first illustrated with respect to email. Many email users use plain-text email programs (such as Netscape) that remove or alter formatting when cutting and pasting from a rich-text source, such as a web page. Such programs frequently insert line breaks into the text in an attempt to regulate line length. The algorithm employed evidently scans for “word” breaks such as a space or punctuation mark to select a line-break insertion point that will not disrupt the text too badly. Stuffing a line break into a URL, however, breaks up the URL. (Similarly, stuffing a line break into columnar text such as an unformatted table makes the table unreadable; there are myriad other examples.)