Inferring screen structure from visual characteristics
Original Publication Date: 2009-Jun-02
Included in the Prior Art Database: 2009-Jun-02
Presented in this article is a novel approach to screen layout detection, which is based on visual characteristics alone (as opposed to using additional information from the application itself or from the communication protocols), and enables analysis of any type of screen in any application, without any changes needed to the original application or to the working environment. This solution can be used in a variety of scenarios, including testing, masking and changing display styles at runtime.
Presented in this article is a solution for identifying screen layout features based only on the visual characteristics of the screen. The idea is to receive information regarding the texts that appear on the screen, their location (coordinates and size), color and font, and based on that information only, to deduce what GUI components, such as titles, tables, forms… exist on the screen. This screen layout is then represented in a persistable manner that can be used for different purposes.
The only data needed for this solution to work is a list of the texts that appear on the screen and their locations. Colors and fonts can further aid the process but are not compulsory. No additional information is needed.
One possible input for this solution can be an XML file containing elements representing each text that appears on the screen,
with attributes describing the
location, color and font of the text. This could possibly be the result of a text recognition component using Online Character Recognition (OCR) technology to automatically discover the texts that appear on a screen from an image (for example a bitmap) of that screen.
We devised analysis techniques in order to deduce the structural layout of a certain screen based on its visual information. The screen analysis process goes over the list of texts and tries to associate them with known UI elements such as titles, labels, tables, based on the relative locations of the texts and the differences in colors and fonts. This phase can also be assisted by additional application-specific configuration information, such as the expected letter size, space between adjacent rows, separators between labels and their texts. This additional information may aid in better recognizing the different components on the screen, but it is not compulsory.
It is also possible to utilize this configuration to enable indentifying more complex types of tables that only appear in certain applications. For example, interleaved tables are pretty common in mainframe applications. If this type of table is enabled (through the application-specific configuration), a more complex algorithm designed to locate these tables will be triggered.
The output of the screen analysis phase is a representation of the screen layout and contents. This may be an internal representation in the computer's memory, used later in the computer program for additional purposes, or it can be represented in textual form for persistence or message propagation between different components in the system. This representation can optionally be an XML file containing a hierarchy of the UI elements, each with its location and textual contents.
We propose an XML schema for describing screens which contains all common UI componen...