IEEJ Transactions on Electronics, Information and Systems
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<Softcomputing, Learning>
A Method of Readability Assessment for Web Documents Using Text Features and HTML Structures
Takahiro YamasakiKin-ichiroh Tokiwa
Author information
JOURNAL FREE ACCESS

2012 Volume 132 Issue 9 Pages 1524-1532

Details
Abstract

This paper describes a method of readability assessment for web documents. Readability is the ease in which text can be read and understood. We hypothesize that the readability is determined whether a reader can easily grasp text structures. The impression and the complexity of text are significant factors. We extract the features about impression and complexity from a plain text and additional data such as HTML tags.
In order to compare effect of extracting features, we are assessing readability rank by machine learning. We conduct 5-fold cross validation for each domain, and calculate the root mean squared error between the actual rank and the estimated rank. The cross validation experiments confirm that the performance of our method is high measured. It shows effectiveness of extracting features about the impression and the complexity for readability assessment.

Content from these authors
© 2012 by the Institute of Electrical Engineers of Japan
Previous article Next article
feedback
Top