2016 Volume 23 Issue 1 Pages 37-58
This paper reports error analysis results on the product attribute value extraction task. We built the system that extracted attribute values from product descriptions by simply matching the descriptions and entries in an attribute value dictionary. The dictionary is automatically constructed by parsing semi-structured data such as tables and itemizations in product descriptions. We run the extraction system on the corpus where product attribute values were annotated by a single subject, and then investigated false-positives and false-negatives. We conducted the error analysis procedure on 100 product pages extracted from five different product categories of an actual e-commerce site, and designed error type categories according to the results of the error analysis on those product pages. In addition to show the error type categories and their instances, we also discuss processing and data resources required for reducing the number of error instances.