CATMA, CLÉA & heureCLÉA

… auf Deutsch

CATMA (Computer AidedTextual Markup and Analysis) is an easy-to-use, intuitive software for text markup und analysis. CATMA is already in use in many international research projects as well as in teaching courses. The software is freely available and does not require any expert knowledge whatsoever, while guaranteeing that any annotation produced will automatically conform to the usual standards (TEI/XML). CATMA was initially developed for PC and Mac OS; the latest release is however fully browser based and integrates a web repository. CATMA’s overall architecture as an integrated annotation and analysis tool was designed in my son Malte Meister’s 2008 BSc thesis submitted at CTI Cape Town/London Metropolitan University (‘child labor’, as he nevertheless put it. That’s what you get for sharing an idea with your son…).

Malte’s original brief was to re-implement TACT (Textual Analysis Computing Tools) for Windows. TACT itself was a suite of DOS based tools used by many DH pioneers and programmed by John Bradley (in Modula 2) back in the early 1980s; it had never been made available any of the modern post-DOS environments. However, we soon realized that rather than porting TACT to Windows a re-implementation from scratch was the way to go. Malte started redeveloping the core analyzer functions of TACT in C++ in 2009, and Marco Petris then added the markup functions (in JAVA). From version 2.0 onwards Marco took over the entire programming and now integrated and developed everything in JAVA up until the final CATMA 3.0 desktop release. Versions 4 and following were then turned into a web service.

In 2010 we were granted the first of two Google Digital Humanities Awards (many thanks!) and started CLÉA (short for Collaborative Literature Éxploration and Annotation – in case you wonder about the é: yes, we’re aware there’s no accent there. It’s intended as a subtle reminder of the needs of those of us who work with non-English alphabets and characters…)

CLÉA’s aim was to develop the web and browser based version CATMA 4.0 which enabled users to work directly with web based digital text collections, such as Google Books, and to annotate the texts and corpora in collaboration with other researchers.

The third development phase was heureCLÉA, a BMBF (German Ministry of Education and Research) funded joint project with a computer scientist team lead by Michael Gertz at Heidelberg University. Commencing in 2013 we investigated possibilities to derive automated markup functionalities by way of a machine learning approach that analyses manual markup which has been produced collaboratively. Our test case were exemplary narratological text markup functions which focused on specific features of narrative texts. In this pilot project we managed to augment CATMA’s manual functionality in terms of text markup through an automated markup heuristic that can identify certain low-to-medium level phenomena computationally and then suggest likely annotations for final verification by a human user.

 

CATMA 5.0 is the current web based, open source version for collaborative annotation – it is available at https://catma.de

CATMA 6.0 will be released in the first quarter of 2019; it will feature a new project management functionality based on Gitlab’s architecture and a Neo4J data base, as well as a completely redesigned UI.

Current information on the CATMA, CLÉA and heureCLÉA projects can be found on the project website.

heureCLEA-team

CATMA & CLÉA project members (grazie mille!):

Marco Petris, Dipl.Comp.Sc.; Janina Jacke, MA; Dr. Evelyn Gius