PML-TQ - Tool for Querying Treebanks

These pages are obsolete! Please go to https://ufal.mff.cuni.cz/pmltq instead.

The PML-TQ is a powerful open-source search tool for all kinds of linguistaically annotated treebanks with several client interfaces and two search backends (one based on an SQL database and one based on Perl and the TrEd toolkit). The tool works natively with treebanks encoded in the PML data format (conversion scripts are available for many established treebank formats).


PML-TQ at Lindat/Claring web service (screenshot) PML-TQ in TrEd (screenshot)

Getting Started

Search various treebanks using our server:

ÚFAL is hosting a PML-TQ search service for PDT 2.0, PDT 2.5, PDT 3.0 and many other treebanks, including Penn Treebank 3, Penn Chinese Treebank, Penn Arabic Treebank, Tiger Corpus 1.0, Universal Dependencies treebanks, and HamleDT treebanks. The server is accessible from several clients, including modern web browsers and the tree editor TrEd (see clients).

Search your local files:

Use the client-side PML-TQ search engine, which is part of the pmltq extension to the tree editor TrEd (see section about client interfaces below).

Search any treebank on your own PML-TQ server:

Download and install the PML-TQ server (Linux, UNIX, Mac OS X) on your computer/server.


User Documentation

Additional Resources

More or less general presentations/tutorials about the PML-TQ

Introductions/tutorials focused on the PML-TQ used for various language phenomena


Clients

Web Browser

Any web browser with good support for SVG rendering, CSS, and JavaScript can be used as a client to a PML-TQ server (Firefox, Google Chrome, Opera browser, Safari, IE >=11).

A PML-TQ server hosted at ÚFAL can be accessed via LINDAT/Clarin web service (many of the treebanks are accessible freely, other treebanks require a login name and password - contact Matyáš Kopp to get information on how to obtain access to other treebanks).

TrEd

A fully graphical client for the PML-TQ with client-side searching capability is part of the tree editor TrEd (a GPL-licensed software available separatelly) as an extension called pmltq. Several other extensions provide PML schemas and visualization stylesheets for various treebanks.

To install this extension, start TrEd, select Setup -> Manage Extensions -> Get New Extensions and select 'pmltq'. When done, press Shift+F3 to start the search. Select Treebank (server) for searching using a PML-TQ server, or 'Files (local)' for searching local files using client-side search engine built into the client (contact Matyáš Kopp to get access to the PML-TQ server hosted at ÚFAL).

Command-line

Under development!


Server

The distribution (see the download section below) contains a fast and efficient implementation of the PML-TQ powered by an SQL database with a client-server architecture (client -> REST API -> PML-TQ server -> SQL database backend).

The server is intended for searching large static data sets (complete treebanks). For individual files or small treebanks, up to say 10K trees (your mileage may vary), the client-side PML-TQ implementation in TrEd is usually sufficient.

The most important dependencies for running a PML-TQ server are:

  • the PML-TQ server distribution, see Download below,
  • an HTTP server,
  • PostgreSQL (>=8.4),
  • Perl (>=5.14),
  • the tree editor TrEd.
The treebank must be encoded in or converted to the PML format.

The server has been tested on Linux.

Download the server

Current version

The current version of the PML-TQ server can be downloaded from the GIT repository: https://github.com//ufal/perl-pmltq-server.

Old versions

Previous versions of the PML-TQ server (up to the version 0.7.10 (beta), released in 2013) were published each as a single .tar.gz archive.
You can still download the most recent of them (still, outdated) here: pmltq-0.7.10.tar.gz (PML-TQ distribution package).

Installation of the PML-TQ Server

To install the server, please contact its current developer, Matyáš Kopp.


Bibliography

General bibliography about the PML-TQ and the PML

Štěpánek Jan, Pajas Petr: Querying Diverse Treebanks in a Uniform Way, in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), European Language Resources Association (ELRA), Valletta, Malta, pp. 1828-1835, 2010

Pajas Petr, Štěpánek Jan: System for Querying Syntactically Annotated Corpora, in Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, Association for Computational Linguistics, Suntec, Singapore, pp. 33-36, 2009

Pajas Petr, Štěpánek Jan: Recent Advances in a Feature-Rich Framework for Treebank Annotation, in The 22nd International Conference on Computational Linguistics - Proceedings of the Conference, Manchester, pp. 673-680, 2008

Case studies bibliography

Onambélé Christophe, Kopp Matyáš, Passarotti Marco, Mírovský Jiří: Converting Latin Treebank Data into an SQL Database for Query Purposes. In: Proceedings of the 2Nd International Conference on Digital Access to Textual Cultural Heritage, ACM, New York, NY, USA, ISBN 978-1-4503-5265-9, pp. 117-122, 2017

Mírovský Jiří, Poláková Lucie, Štěpánek Jan: Searching in the Penn Discourse Treebank Using the PML-Tree Query, in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA), Paris, France, pp. 1762-1769, 2016

Zikánová Šárka, Hajičová Eva, Hladká Barbora, Jínová Pavlína, Mírovský Jiří, Nedoluzhko Anna, Poláková Lucie, Rysová Kateřina, Rysová Magdaléna, Václ Jan: Discourse and Coherence. From the Sentence Structure to Relations in Text, Chapter 8. ÚFAL, Praha, Czechia, ISBN 978-80-904571-8-8, 274 pp., Dec 2015 (focused on searching for discourse relations in the Prague Dependency Treebank)


Authors

© 2008-2010 Petr Pajas and Jan Štěpánek
© 2011-2013 Jan Štěpánek
© 2013-2015 Michal Sedlák
© 2015-2018 Matyáš Kopp (kopp at ufal.mff.cuni.cz)

Acknowledgement

The development of the PML-TQ was/has been supported by the following projects:


License

This software is published under the GPL (General Public License).