Project Description

Language resources such as corpora, dictionaries, lexicons, grammars, computer programs, experimental data or collections of results are of main importance in linguistic research. Their construction is in many cases very complicated, as information can be permanently lost or can no longer be processed. The NaLiDa project for Sustainability of Linguistic Data at the Department of Linguistics at the University of Tübingen is mainly concerned with preserving such kind of data that is necessary for the conduction of research on the long run.

The sustainable provision of data includes the following aspects:

  • Finding and announcing data in the scientific community
  • Reuse in other contexts
  • Citation of fundamental data in publications
  • Cooperation with other researches
  • Citation within the context of a project's funding

The accomplishment of these goals requires further development and research as well as support by researchers. Therefore, the research project deals with the following aspects:

  • Gathering data
  • Collecting resources/data
  • Accessing data
  • Legal and ethic aspects of ong-term archiving processes
  • Standards for language resources
  • Advice for all aspects of dealing with digital texts

The result of these research activities points to a surplus value for the research community. For this purpose, the NaLiDa project works independently with existing resources while also supporting other resource creators. These comprise the following areas:

  • Documentation: Support for the creation of resource descriptions
  • Catalogue: Search function based on resource descriptions for locating resources
  • Portal: Information on language resources and metadata (Blog, Catalogue, Glossary, Publications, Tutorials, Workshops)

Within the project, the focus is on data created by German research projects. Talking about data, it means referring to linguistic data, i.e. corpora, dictionaries, grammars, but also software tools and services as well as lists containing the results of various studies. Unlike traditional libraries and archives, it is ensured that resources can be found by content-related aspects, not only (but also) with the help of bibliographic information. This "depth-first search mechanism" (for example with a Faceted Browser) represents an essential starting point which is meant to ensure sustainability and is based on Semantic Web concepts. Foundations for this accessibility are the structured key-wording, description and classification of resources that are all represented by metadata.

Second Project Phase

In the 2nd funding phase of the NaLiDa project, the collection, description and archiving of resources were continued. However, the scope was broadened, taking into account more generic, non-area specific solutions for the sustainable archiving of research data. In that phase, the 'Seminar für Sprachwissenschaft' (department of linguistics) was included as junior partner. The overall control for the 2nd. project phase was given to the 'Informations-, Kommunikations- und Medienzentrum' (centre for information, communication and media, IKM) of the University of Tübingen. The IKM is composed of the university library and the centre for data processing.