電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<ソフトコンピューティング・学習>
テキストの特徴とHTML構造を利用したWeb文書の読みやすさ評価方法
山崎 高弘常盤 欣一朗
著者情報
ジャーナル フリー

2012 年 132 巻 9 号 p. 1524-1532

詳細
抄録

This paper describes a method of readability assessment for web documents. Readability is the ease in which text can be read and understood. We hypothesize that the readability is determined whether a reader can easily grasp text structures. The impression and the complexity of text are significant factors. We extract the features about impression and complexity from a plain text and additional data such as HTML tags.
In order to compare effect of extracting features, we are assessing readability rank by machine learning. We conduct 5-fold cross validation for each domain, and calculate the root mean squared error between the actual rank and the estimated rank. The cross validation experiments confirm that the performance of our method is high measured. It shows effectiveness of extracting features about the impression and the complexity for readability assessment.

著者関連情報
© 2012 電気学会
前の記事 次の記事
feedback
Top