The Typological Database System (TDS)

Summary

The Typological Database System (TDS) provides integrated access to a collection of independently developed typological databases. Unified querying is supported with the help of an integrated ontology. The component databases of the TDS are cross-linguistic databases, developed for research in language typology and linguistics. Together they contain some 1200 different descriptive properties, with information about more than 1000 languages. (Because of the heterogeneous nature of the collection, most properties are only filled for a fraction of the languages). Most of the data is in the form of high-level “analytical” properties, but there are also a few collections of example sentences (with glosses) illustrating particular phenomena.

Description

Language typology, the study of the range of language variation and universals, is a data-intensive discipline that increasingly relies on electronic databases. Improved availability of the data collected in the TDS will enhance its potential to support linguistic research.

The TDS can be used to help answer questions such as “which languages have the basic word order Verb-Object-Subject”, “what kind of phonological stress systems are common” “are languages with subject-verb agreement more likely to allow null subjects than languages without it” etc. The system is not an oracle: In all cases, only partial information is returned, as collected and deposited in the system by the creators of the component databases. But this information can be invaluable to other researchers, either as a complete answer to a specific question or as the starting point for further research.

Given that the collected data represents linguistic analysis and often novel theoretical approaches, it is impossible to map it to a single “consensus” standard. While in some limited cases it is possible to completely reconcile data from different sources, the system places a premium on preserving the theoretical orientations and analyses of the component databases, which are presented side by side as alternative datasets in the same topical area.

History

The TDS project was carried out by a research group of the Netherlands Graduate School of Linguistics (LOT), with members representing the University of Amsterdam, Leiden University, Radboud University Nijmegen, and Utrecht University. It was financed by NWO (Netherlands Organization for Scientific Research) grant 380-30-004 / INV-03-12, and by the participating universities. The initial phase of the project was started in September 2000, and the project entered the implementation phase on 1 May 2004. Originally scheduled to run for three years, it was extended until 31 December 2007. The TDS server and data collections continued to be augmented until 2009.

Thanks to the “TDS Curator” project (Utrecht University, DANS, and Meertens), supported by a CLARIN-NL Call 1 grant, the TDS has migrated to a new platform hosted by the Data Archiving and Networked Services (DANS). Both versions of the system continue to be operational.

Resources

  • The original TDS server is available at http://languagelink.let.uu.nl/tds/. The site includes a tutorial on how to use the database’s search functions.
  • The new interface (TDS Curator) can be found at http://tds2.dans.knaw.nl/. It is simpler to use but less feature-rich. E.g., only the original server allows geographic display of language properties. Note that in order to use this system, one or all of the databases have to be selected before the browser and querying functions can be called upon.
  • The component databases and software of the TDS are also statically archived and available for download from the EASY repository of DANS. (Requires registration.) They are available in the XML-based IDDF format (Integrated Data and Documentation format), developed by the project.