Apparatus and Method for Chinese Drug Name Recognition
Publication Date: 2017-Dec-30
The IP.com Prior Art Database
Apparatus and Method for Chinese Drug Name Recognition Drug name recognition, which recognizes pharmacological substances from biomedical texts and classifies them into predefined categories, is an essential prerequisite step for drug information extraction such as drug-drug interactions. The Chinese name of a western drug is translated from English. The doctors may
input a similar name which is not exactly the one CFDA（China Food and Drug
Administration） announced. The drug names may have some errors but they
have the same pronunciation, for example, 双氯氛酸钠/双氯芬酸钠/双氯酚酸钠.
To reconize the drug names in the text, we can use the general named entity recoginition technique. The state of the art are the new deep learning methods developed by Xuezhe Ma and Eduard Hovy ( End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, ). In their method, the authors used the char representation as the input features in the LSTM-CNNs-CRF framework. They first used convolutional neural networks (CNNs)to encode character-level information of a word into its character-level representation. Then they combined character- and word-level representations and feed them into bi-directional LSTM(BLSTM) to model context information of each word. On top of BLSTM, they used a sequential CRF to jointly decode labels for the whole sentence. See Figure1 below.
I like playing soccer
padding paddingP l a y i n g
Figure 1 From convolution and max pooling, the char representation is learned which can describe some prefix and postfix of the words in the sentences. In this disclsure, we extended the char representation to Pinjin which can be used in Chinese drug name recognition. The Chinese characters are encoded by