Prof. Dr. Walt Detmar Meurers
Exploring the Data-Driven Prediction of Prepositions in English


Anas Elghafari, Detmar Meurers, Holger Wunsch


Proceedings of COLING 2010, the 23rd International Conference on Computational Linguistics. Beijing, China.


Prepositions in English are a well-known challenge for language learners, and correspondingly the computational analysis of preposition usage has attracted significant attention. Such research generally starts out by developing a model of preposition usage for native English which is based on a range of features, from shallow surface evidence to deep linguistically-informed properties.


While we agree that ultimately a combination of shallow and linguistically informed features is needed to balance the preciseness of exemplars with the usefulness of generalizations to avoid data sparsity problems, in this paper we explore the limits of a purely surface-based prediction of prepositions.


Using a web-as-corpus approach, we investigate a classification setup based solely on the relative number of occurrences obtained for target n-grams varying in the preposition used. We show that such a surface-based approach is competitive with the published state-of-the-art results. Where enough data is available, in a surprising number of cases it thus is possible to obtain sufficient information from the relatively narrow window of context provided by n-grams which are small enough to frequently occur but large enough to contain enough predictive information about preposition usage.



Note: The electronic versions of the publications linked on this page are the last versions I had the copyright for. Where a publisher copyedited and/or typeset the papers, the electronic copies linked here are NOT identical to the officially published version, which should be used for any quotes, references to page numbers, etc.



Bibtex entry:

   author = 	 {Anas Elghafari and Detmar Meurers and Holger Wunsch},
   title = 	 {Exploring the Data-Driven Prediction of Prepositions in English},
   booktitle =   {Proceedings of COLING 2010, the 23rd International 
                  Conference on Computational Linguistics},
   address =     {Beijing, China},
   pages =       {267--275},
   year = 	 {2010},
   url =         {}