Context Sensitive Dictionary
Publication Date: 2017-Jun-02
The IP.com Prior Art Database
Disclosed is a system for automatically clarifying the definition of a term, within a given body of text, which has multiple meanings based on the reader’s original language and experience, the time of the writing, and the dialect of the author. The system automatically analyzes the text that a user is reading to determine the context in which it was written, and then upon request provides the reader context-sensitive definitions for selected words or phrases.
Context Sensitive Dictionary
Within a body of text, the meanings of some terms can differ depending on the reader’s
original language and experience, the time of the writing, and the dialect of the author.
The word can have different meaning or multiple dictionary definitions. For example, a
"boot" to an American is a piece of footwear, while to a Briton, it is the trunk of a car.
The novel contribution is a system for automatically identifying the context of the work,
and then using that to provide appropriate definitions for words or phrases on request.
The system automatically analyzes the text that a user is reading to determine the
context in which it was written, and then upon request provides the reader context-
sensitive definitions for selected words or phrases.
The system consists of:
An analyzer, which parses the text and the associated to determine the most likely context of the author and the text. This specifically includes the year in which the text was written and the native language and dialect of the author.
A dictionary with definitions for a broad set of contexts, including historical usages and regional variations. The system tags these definitions with the appropriate context for matching.
An interface by which the reader may select a word or phrase in order to request a definition
A processor that takes the output of the analyzer and the definitions from the dictionary and uses that information to produce a sorted list of appropriate definitions
Figure: System components
When the text is first opened, the analyzer parses it to determine the context. For
providing definitions, the most significant attributes to determine are the year in which
the text was written and the native language (including dialect) of the author. Other
attributes may be helpful for determining the appropriate definitions including the
author's identity (e.g., age, gender, socioeconomic status, etc.) and other works by the
author. The analyzer determines this information by looking at metadata provided with
the text (e.g., the EPUB/MOBI/PDF e-book file formats can have this information
embedded) and the text itself.
The metadata may directly indicate the year in which the text was written via the
copyright date. It is networked, so if the author's name is provided, the system can
query online databases to determine the author's native language. If the metadata
includes the name of the text, it may try to locate additional metadata from other online
databases. It scans the text for patterns indicative of the text being written in a certain
time and place.
Consider an example line from Charles Dickens' A Christmas Carol, "He shed a few
drops of water on them from it, and their good humour was restored directly.” The
phrase "good humour" is rarely used in American English, and the spelling of "humour"
is the British variant. The phrase was more popular in the 19th century than in the 20th,
so using these data points, the analyzer is able...