Because a protein expresses its function through interaction with other substrates, it is vital to create a database of protein interaction. Since the total volume of information on protein interaction is described in terms of thousands of literatures, it is nearly impossible to extract all this information manually. Although extraction systems for interaction information based on the template matching method have already been developed, it is not possible to match all the sentences with interaction information due to the extent of sentence complexity.
We propose a method of extracting sentences with interaction information independent of sentence structure. In a protein-compound complex structure, the interacting residue is near to its partner. The distance between them can be calculated by using the structure data in the PDB database, with a short distance indicating that the sentences associated with them might describe the interaction information. In a free-protein structure, the distance cannot be calculated because the coordinates of the protein's partner are not registered in the structure data. Hence, we use the homology protein structure data, which is complexed with the protein's parter.
The proposed method was applied to seven literatures written about protein-compound complexes and four literatures written about free proteins, obtaining F-measures of 71% and 72%, respectively.
抄録全体を表示