Browse Prior Art Database

Apparatus and Method for Chinese Drug Name Recognition Disclosure Number: IPCOM000252241D
Publication Date: 2017-Dec-30
Document File: 4 page(s) / 137K

Publishing Venue

The Prior Art Database

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 56% of the total text.

Apparatus and Method for Chinese Drug Name Recognition Drug name recognition, which recognizes pharmacological substances from biomedical texts and classifies them into predefined categories, is an essential prerequisite step for drug information extraction such as drug-drug interactions. The Chinese name of a western drug is translated from English. The doctors may

input a similar name which is not exactly the one CFDA(China Food and Drug

Administration) announced. The drug names may have some errors but they

have the same pronunciation, for example, 双氯氛酸钠/双氯芬酸钠/双氯酚酸钠.

To reconize the drug names in the text, we can use the general named entity recoginition technique. The state of the art are the new deep learning methods developed by Xuezhe Ma and Eduard Hovy ( End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, [2016]). In their method, the authors used the char representation as the input features in the LSTM-CNNs-CRF framework. They first used convolutional neural networks (CNNs)to encode character-level information of a word into its character-level representation. Then they combined character- and word-level representations and feed them into bi-directional LSTM(BLSTM) to model context information of each word. On top of BLSTM, they used a sequential CRF to jointly decode labels for the whole sentence. See Figure1 below.










I like playing soccer

padding paddingP l a y i n g




Max Pooling

Char Representation

Figure 1 From convolution and max pooling, the char representation is learned which can describe some prefix and postfix of the words in the sentences. In this disclsure, we extended the char representation to Pinjin which can be used in Chinese drug name recognition. The Chinese characters are encoded by