ISCL Proseminar (Summer Semester 2016)

Statistical Natural Language Processing


First, the course introduces some general statistics relevant in the given context, such as variability, elementary probability theory, distributions and hypothesis testing. Further, it provides a brief introduction to the field of Machine Learning. Second, based on that background, the course covers some core techniques in statistical natural language processing (NLP), such as Markov chains and hidden Markov models, as well as applications such as collocation discovery, language models, part-of-speech tagging, speech recognition and text categorization.

Instructor: Serhiy Bykh

Tutor: Björn Rudzewitz

Course meets:

Credits for ISCL BA: 9 CP
For other degree programs, contact us for requirements and credits.

Moodle: We will be using the university Moodle site for the course, primarily for the discussing forum and to access course materials. Our course is accessible under Moodle at

To log into this specific Moodle site, you use your general ZDV university account id and password. The first time you access the course Moodle site, you need a course subscription password, which you get in class. Moodle and privacy: Note that Moodle generally keeps detailed logs of your interaction with the system, e.g., when you log in, etc.

Email: In the Moodle system everyone in the course can send messages to other participants in the class, and we will use this to contact you for class related matters. Such email gets sent to your regular ZDV account ( So register in the Moodle during the first week of the semester, and read your university email regularly please.

Grading: The course will be graded based on participation and:

  1. Two homework assignments (2 20% = 40%)
  2. Machine Learning lab (10%)
  3. Final exam (50%).

The final exam will be held presumably on July 18. To pass the exam, you have to obtain at least 60% of the points.

Academic conduct and misconduct: Learning and research are driven by discussion and free exchange of ideas, motivations, and perspectives. So you are encouraged to work in groups, discuss, and exchange ideas. At the same time, the foundation of the free exchange of ideas is that everyone is open about where they obtained which information. Concretely, this means you are expected to always make explicit when you’ve worked on something as a team – and keep in mind that being part of a team means sharing the work! For text you write, you always have to provide explicit references for any ideas or passages you reuse from somewhere else. This includes text “found” on the web, where you should cite the url of the web site in case no more official publication is available. Failure to follow these important guidelines is academic misconduct, which will be sanctioned by failing you on the assignment, exam, or the entire class depending on the severity of the violation.

Class etiquette: Please come to class on time, do not pack up early, read or work on materials for other classes during our class. When in the computer lab, only use the computers when you are asked to do a specific activity; do not read email or browse the web. If for some reason, you have to leave early or miss class for an important reason, please let me know before class. Note: Following the standard rules, missing more than two seminar meetings unexcused, automatically results in failing the class. The attendance in the tutorial is not obligatory, but explicitly highly recommended (!): The graded homework assignments will be based on the tasks and exercises discussed in the tutorial, some of the material will be covered exclusively in the tutorial.

Course Readings: