Prof. Dr. Walt Detmar Meurers
Publications
On The Applicability of Readability Models to Web Texts

  

Sowmya Vajjala and Detmar Meurers

  

Proceedings of the Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) at ACL 2013, Sofia, Bulgaria.

  

An increasing range of features is being used for automatic readability classification. The impact of the features typically is evaluated using reference corpora containing graded reading material. But how do the readability models and the features they are based on perform on real-world web texts? In this paper, we want to take a step towards understanding this aspect on the basis of a broad range of lexical and syntactic features and several web datasets we collected.

Applying our models to web search results, we find that the average reading level of the retrieved web documents is relatively high. At the same time, documents at a wide range of reading levels are identified and even among the Top-10 search results one finds documents at the lower levels, supporting the potential usefulness of readability ranking for the web. Finally, we report on generalization experiments showing that the features we used generalize well across different web sources.

  


  

Electronically available file formats:

  


  

Bibtex entry:


@InProceedings{Vajjala.Meurers-13,
  author    = {Vajjala, Sowmya  and  Meurers, Detmar},
  title     = {On The Applicability of Readability Models to Web Texts},
  booktitle = {Proceedings of the Second Workshop on Predicting and 
               Improving Text Readability for Target Reader Populations},
  month     = {August},
  year      = {2013},
  address   = {Sofia, Bulgaria},
  publisher = {Association for Computational Linguistics},
  pages     = {59--68},
  url       = {http://purl.org/dm/papers/Vajjala.Meurers-13.html},
  file      = {http://www.aclweb.org/anthology/W13-2907}
}