Semantic Labeling for Numerical Values: Distribution-Based Similarities

Phuc Nguyen; Hideaki Takeda

doi:10.11517/jsaisigtwo.2019.SWO-047_12

抄録

In recent years, there has been an increasing interest in numerical semantic labeling, in which the meaning of an unknown numerical attribute is assigned by the label of the most relevant attributes in predefined knowledge bases. Previous methods used the p-value in statistical hypothesis testing to estimate the relevance and thus strongly depend on the distribution and type of data domain. In other words, the p-value based similarity is unstable for general cases, where such knowledge is undefined. In this paper, we first point out the p-value based similarity limitations. Second, we proposed the Distribution-Based Similarities where the similarities are derived from the norms of the inverse transform sampling of attribute distributions. Our experiments on City Data and Open Data show that the Distribution-Based Similarities outperforms other the p-value based approaches in the task of semantic labeling for numerical values.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

第二種研究会の全記事は認証なしでアクセス可能です．また，各記事の著作権は原則として著者に帰属します．

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）