Abstract
There are a lot of interests on extraction methods for finding associations between diseases and genes from literatures such as MEDLINE abstracts. The strength of association between a gene and a disease can be measured by the number of articles in which the gene and the disease co-occur. However, this method cannot identify specific genes to a particular disease because a highly ranked gene may have association with other diseases. In this paper, we propose an algorithm that extracts a group of associated genes with a given disease and prioritizes them in terms of their specificities to the disease. This enables the identification of associated genes that are expected to have fewer side effects, which contributes to efficient drug developments. Our proposed method incorporates transitive associations between the disease and genes based on the frequency of co-occurrence of gene terms. Furthermore, we evaluate the precision of ranking algorithm using a public dataset.