Hauptseminar
Winter Semester 2015/2016

Natural Language Processing with Python: A hands-on introduction using NLTK

Abstract:

This course provides a hands-on introduction to programming in Python using NLTK. The Natural Language Toolkit NLTK is an open source platform offering transparent access to a broad range of algorithms and resources for computational linguistics.

Instructors:

Course meets:

Language:

Moodle page: https://moodle02.zdv.uni-tuebingen.de/course/view.php?id=1289

Syllabus (this file):

Nature of course and our expectations: This Hauptseminar intends to provide an overview of the concepts and issues involved in research in this domain. Participants are expected to

  1. regularly and actively participate in class and read/prepare the material assigned by any of the presenters. (20% of grade)
  2. prepare and present a topic (30% of grade)
  3. write and submit a term paper in Moodle (50% of grade)

Credits: After successful completion of the course, a Hauptseminar Schein in Core Computational Linguistics is issued, with the following credit points options:

Academic conduct and misconduct: Research is driven by discussion and free exchange of ideas, motivations, and perspectives. So you are encouraged to work in groups, discuss, and exchange ideas. At the same time, the foundation of the free exchange of ideas is that everyone is open about where they obtained which information. Concretely, this means you are expected to always make explicit when you’ve worked on something as a team – and keep in mind that being part of a team always means sharing the work.

For text you write, you always have to provide explicit references for any ideas or passages you reuse from somewhere else. Note that this includes text “found” on the web, where you should cite the URL of the web site in case no more official publication is available.

Topics:

We will generally follow the NLTK book (http://www.nltk.org/book) with materials added by the presenters wherever useful.

  1. Introduction. Language Processing and Python
  2. Accessing Text Corpora and Lexical Resources
  3. Nov 4, 6: QITL (http://www2.sfs.uni-tuebingen.de/qitl)
  4. Processing Raw Text (Roshanak Hamidi, Mei-Shin Wu, Julia Koch)
  5. Writing Structured Programs (Andreas Daul, Eduard Schaf, Haywood Shannon, Martina Stama-Kirr)
  6. Categorizing and Tagging Words (Natalie Clarius, Kevin Mann, Yevgen Karpenko)
  7. Learning to Classify Text (Aria Omidvar, Christian Adam, Niklas Schulze, Melika Azimi)
  8. Extracting Information from Text (Alina Ladygina, Kathrin Adlung, Anastasia Gorbunova, Kanghyun Yu)
  9. Analyzing Sentence Structure (Asia Deinekina, Lisa Verena Hiller, Luis Ibargüen, Samuel Solzin)
  10. Building Feature Based Grammars (Ben Campbell, Valentin Pickard, Zarah Weiß)
  11. Analyzing the Meaning of Sentences (Mihael Simonic, Alina Allakhverdieva, Olga Sozinova, Eyal Schejter)
  12. Managing Linguistic Data (Vivian Fresen, Sabrina Galasso, Holger Muth-Hellebrandt, David Bausch)
  13. Discussion of project ideas for term papers

Note: The syllabus is subject to change, as we progress through the semester. So check the online version regularly.

Last update: February 3, 2016