Interdisciplinary Information Sciences
Online ISSN : 1347-6157
Print ISSN : 1340-9050
ISSN-L : 1340-9050
Regular Papers
A Hybrid Method for Open Information Extraction Based on Shallow and Deep Linguistic Analysis
Vahideh RESHADATMaryam HOORALIHeshaam FAILI
Author information
JOURNAL FREE ACCESS

2016 Volume 22 Issue 1 Pages 87-100

Details
Abstract

Open Information Extraction is a relation-independent extraction paradigm that extracts assertions from massive and heterogeneous corpora such as the Web. Light relation extractors focus on efficiency by restricting analysis to some shallow linguistic tools such as part-of-speech tagging. Although these methods are fast and scalable, they are unable to deal with complex sentences (such as complicated and long distance relations) due to using only shallow syntactic features. This paper presents two novel hybrid methods, TextRunner-DepOE (TR-DOE) and ReVerb-DepOE (RV-DOE) which combine high-performance subset of shallow Open IE systems with the strengths of a deep Open IE system. We detect the best trade-off between precision and recall by tuning two combination parameters: sentence length and confidence measure. Since the focus is on using time efficiently, we used a fast and robust deep extractor. Experiments indicate that the proposed hybrid methods obtain significantly higher performance than their constituent systems. The best result was for TR-DOE which had an F-measure almost twice that of TextRunner.

Content from these authors
© 2016 by the Graduate School of Information Sciences (GSIS), Tohoku University

This article is licensed under a Creative Commons [Attribution 4.0 International] license.
https://creativecommons.org/licenses/by/4.0/
Previous article Next article
feedback
Top