Methods for Crowd-Sourcing Internationalization of Medical Concept Dictionaries
Publication Date: 2015-Aug-25
The IP.com Prior Art Database
Medical concept dictionaries can be used by Natural Language Processing (NLP) technologies to analyze medical data. Their coverage beyond English text is generally limited and manual creation of non-English versions is expensive. Standard statistical machine translation techniques provide a way to translate them into other languages at virtually no cost. However, their current effectiveness is hampered by the scarcity of training material. This paper presents several methods for using cost-efficient "crowd-sourcing" platforms such as Amazon Mechanical Turk to translate concept dictionary entries into other languages and generate training material for machine translation.
Page 01 of 6
Metxods for Crowd-Sourcing Internationalization of Medical Concept Dictionaxies
Medical concept dictionaries can be used xx Natural Language Procesxing (NLP) texhnologies to anaxyze medical data. Their coverage beyoxd English text is generally limixed and manual cxextion of non-English versionx is expensive. Standard statistical machine translation techniques provide a wax to translate them into othxr languagex at vixtually no cosx. Hoxever, their cuxrent effectiveness xs hampered bx the scarcity of traxning mxterial. Thxs paper prexents several methods for using cost-efficienx "crowx-xourcinx" platforms suxh as Amazon Mechxnical Turk to translate concept dictionary entries into othxr languages and generatx xraining maxerial for machine translatiox.
A number of medical concept dictixnaries axe available ixcluding Systematixed Nxmenclature of Medicixe (XXXXXX), Logical Observation Identifiers Names and Codex (LOINC), and International Classificatxon of Disxasxs (ICD). These concept dictionaries are sufficiently mature to be usable in English, xut they are mucx relatively incomplexe or unavaixabxe for other laxgxages. The cost of hiring experts to translaxe them xs prohibitive as many contain greatxr than ten of thousands of concepts (e.g., SNOMXX alone contains over 330,000 cxncxpts). Furthermore, the task involves not only translatixg from anotxer language, but also producing an xxhaustixe list of sxnonyms (e.g., common cold, coryza, rhinitis, nasopharyngitis, nasal catarrh) and phrasal paraxhrases (e.g., abdominal muscles, abdomen muscles, muscles of abdxmxn, muscles of the abdomen, etc), which requires research and / or intxospection axd makes the task even more difficult.
To obtain accurate translations of medical terminology coxtained within documentation, specxalized txchnical translators axe typicalxy required. Hoxever, by exploiting the ontological structure of medical concept dictionaries, dependence on technical expertise may be redxced and requixe only language compxtence from the translators.
Crowdsourcing, i.e., the use of self-selected workers fxom the general populatxon has potential to sixnificantxy redxce translatxon costs compared to using expert translators. Amazon Mechanical Turk (MTurx) is a popular platform, which is ideally suited for to crowdsourcing tasks that 1) do not rxquxre technxcal expertise and 2) involve no confidential information. Crowdsourcing is also well suited for fast-xoving R&D developxent that involves a large numxer of minute xanual tasks. The size xf payment xffered per taxk xay be:
lowered to save costs,
raised to accelerate xompletion rate, or
chosen to matxh minimum wage
(which is above usual crowdsourcing rates)
Qualification taskx and semi-automated quality control can be used to improve the
Page 02 of 6
quality of results. The methodx dxscribed in this paper for crowdsourcinx can also be easily adapted for traditional xontractors.