Data Science Journal
Online ISSN : 1683-1470
Contributed Papers
Detecting Family Resemblance: Automated Genre Classification
Yunhyong KimSeamus Ross
Author information
JOURNAL FREE ACCESS

2007 Volume 6 Pages S172-S183

Details
Abstract

This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features, and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.

Content from these authors

This article cannot obtain the latest cited-by information.

Previous article Next article
feedback
Top