Abstract
Tables are an efficient way to express relational information. Most of information about products is written in tabular form. Table (specification) extraction is a significant task to handle product information written in tabular form such as specifications. We are developing a multi-specifications summarization system. The specifications are written in ‹TABLE› tags. The presence of the ‹TABLE› tags in an HTML document does not necessarily indicate the presence of specifications. Less than 30% of HTML ‹TABLE› tags are real tables in one particular domain. In this paper, we propose a method for specification extraction using SVMs. To reduce the training data, we also evaluate this task by using transductive SVMs. For PC, digital still camera and printer specifications, we evaluate the performance of SVMs and transductive SVMs. Experimental results show the effectiveness of our methods.