Browse Prior Art Database

A method of detecting garbage characters using syntactic analysis of natural language

IP.com Disclosure Number: IPCOM000237004D
Publication Date: 2014-May-27
Document File: 4 page(s) / 66K

Publishing Venue

The IP.com Prior Art Database

Abstract

As an essential phase in the whole product development life cycle, globalization testing can guarantee our products adapted to all languages and cultures before release. Garbage character detection is one important part of globalization testing and usually is done manually. The testing effort is very high and there's still no good way to automatically detect garbage characters in the GUI or logs file at present. Syntactic analysis or parsing "is the process of analyzing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar". With it, we can get one valid parse tree for any sentence of human natural language based on the relation between each word of the language stream. For all garbled strings, there's no any relation between each characters. If do syntactic analysis against such garbled strings, we can't get one valid parse tree. Since all strings in any GUI are human natural languages and we always know which language the GUI is while performing globalization testing on them, here we disclosed one new method to automatically detect garbage characters using syntactic analysis, which can be used for all different types of character garbling: question mark, square box and distorted characters.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 4

A method of detecting garbage characters using syntactic analysis of natural language

Globalization testing is an essential phase in the whole product development life cycle. With it, we can guarantee our products adapted to all languages and cultures before release. Garbage character detection is one important part of globalization testing, which usually is done manually. Except manual testing, there's still no good way to detect garbage characters in the GUI or logs file automatically at present. Although various patents have been filed for detecting garbled characters, such as comparing inputted data & output data (Japanese Patent Application Publication No. 2006-185388), check output data against registered information (Japanese Patent Application Publication No. 2000-82025, 2006-163578), add a tag to application data (Japanese Patent Application Publication No. 2002-109475) or add some known non-English characters to the specific position of English strings (US20080181504 A1), all of them need much known information or have many prerequisites, thus they are very difficult to be implemented and adopted by testers due to the input data of a software are not only limited to user input data but also from many other software or operating system.

Syntactic analysis or parsing "is the process of analyzing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar".

"Within computational linguistics", syntactic analysis "is used to refer to the formal analysis by computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information." "Parse trees are usually constructed according to one of two competing relations, either in terms of the constituency relation of constituency grammars (= phrase structure grammars) or in terms of the dependency relation of dependency grammars."

"In linguistics, phrase structure grammars are all those grammars that are based on the constituency relation", "hence phrase structure grammars are also known as constituency grammars." Phrase structure grammars "view sentence structure in terms of the constituency relation. The constituency relation derives from the subject-predicate division of Latin and Greek grammars that is based on term logic and reaches back to Aristotle in antiquit...