Identifying phenotype differentiating/pathway implicating gene expression signatures from Microarray data
Original Publication Date: 2004-Aug-10
Included in the Prior Art Database: 2004-Aug-10
Identification of the underlying cancer causation mechanism is extremely important in understanding how to treat the disease. The disregulation of a given cellular pathway may show up in the gene expression profile of the cell. If such is the case, computational techniques that can detect this profile change can be used to detect the pathways that have been disregulated. Following is a disclosure that demonstrates a classification based scheme to detect these pathways. The scheme has been applied on publicly available breast cancer data. Our results show that a mechanism that avoids immune surveillance may be implicated in the more aggressive kind of cancer studied.
Identifying phenotype differentiating /pathway implicating gene expression signatures from Microarray data
A method is diclosed that enables the detection/identification of disease causing pathways from Gene Expression data using classification techniques. The problem solved and an example of the same is provided in the following section.
Problem Statement Given a particular pathway/mechanism, we wish to identify whether such pathway/mechanism behaves differentially between two cells with different phenotypes. A generic statement would be "Given the details of the genes that participate in a regulatory mechanism and their expression profiles: identify whether a failure in the mechanism is associated with the disease state". In this study we consider human cancer and detect whether the disruption of a particular cellular pathway, the FAS-FASL pathway, leads to invasive cancer. We choose this pathway because of its relevance in evading the immune surveillance.
Related Work Currently classification techniques like Artificial Neural Networks (ANN), Support Vector Machines, K-Nearest Neighbor methods, etc. have been used to classify among different cell types based on gene expression data. Golub et al developed methods to classify among two cancer sub-types AML (Acute Myeloid Leukemia) and ALL (Acute Lymphoblastic leukemia). Khan et al. used ANN's to classify among SRBCT (Small Round Blue Cell Tumors) which are difficult to distinguish using conventional techniques. These studies and others that followed, clearly demonstrate how classification techniques can be used to differentiate among different tissues types using gene expression data. The study by Nigam Shah et al attempts to integrate the biological knowledge in the feature selection process and subsequently derives inferences about the biological processes that are active in cancer cells being studied.
Solution In this section the solution to the problem is discussed by considering cancer as a model disease. However our techniques can be easily extended to other diseases. Given the fact that cancer cells differ from the normal cells very significantly at the molecular level. With the help of microarray technology we can measure expression levels of many genes and compare them across the two types. Cancer is a multigenic disease, i.e., change in more than one gene leads to causation of cancer. Thus univariate methods may not be successful across all cases. Therefore we require a multivariate approach that takes into consideration many factors simultaneously and hence can be better adapted to the problem at hand. Our proposed solution consists of the following two steps.
Step 1: Build a classifier that can differentiate between cancer and normal cells (or two types of related cancers) based on the expression values of the genes that are known to participate in the given pathway.
Step 2: Check classifier performance in terms of its accuracy. If the classifier shows goo...