EDICTOR 3

Home

Welcome to EDICTOR 3 (EDICTOR 3.0)

EDICTOR 3 is an interactive tool for computer-assisted language comparison. Based on multilingual wordlists stored in TSV files, the tool allows you to edit and compute cognate sets, align cognates, both manually and automatically, and to infer and annotate correspondence patterns. To get started, just click on the GET STARTED box below or OPEN A DEMO. You can also check out several examples that have been already edited (go to the tab Examples). To learn about the tool, check out the help by navigating to the Help tab.

GET STARTED

OPEN A DEMO

About

About EDICTOR 3

EDICTOR has been around since 2017, when it was first presented in a system demonstrations paper (List 2017). In this early version, EDICTOR already allowed to code cognates (both partial and full cognates) and to align cognate sets. Later in 2021, a new major version was released that would allow for an improved annotation of morpheme glosses (List 2021). With EDICTOR 3, there is now again a new version that brings about several changes in the form of new features.

Citing EDICTOR 3

If you use EDICTOR 3 in your work, please quote it as shown below.

List, Johann-Mattis and Kellen Parker van Dam (2024): EDICTOR 3. A web-based tool for Computer-Assisted Language Comparison [Software Tool, Version 3.0]. MCL Chair at the University of Passau: Passau. URL: https://edictor.org.

The BibTeX representation looks as follows.

@book{EDICTOR3,
  author = {List, Johann-Mattis and van Dam, Kellen Parker},
  year = {2024},
  title = {EDICTOR 3. A web-based tool for Computer-Assisted Language Comparison},
  publisher = {MCL Chair at the University of Passau},
  address = {Passau},
  url = {https://edictor.org},
}

News

2024-08-14 EDICTOR 3 Published

Having fixed some final bugs in the correspondence patterns panel, adding also GUI tests with Selenium that pass in Firefox, we have now finally published EDICTOR 3.

2024-08-08 EDICTOR 3 Published in Beta Version

We have now published EDICTOR 3 in a Beta version for final testing over the weekend. If no further problems pop up, EDICTOR 3 will be officially published on Monday next week, right before the keynote talk of Johann-Mattis List during the LChange workshop at ACL.

2024-08-03 Publication of EDICTOR 3 is Approaching

We work really hard to make the timely publication of EDICTOR 3 happen. In less than two weeks, Johann-Mattis List will present EDICTOR 3 in a virtual keynote at the LChange Workshop organized as part of the ACL work in Bangkok. Until then, we hope to have released EDICTOR 3.0 on PyPi. Before we do so, however, we want to make sure to add as many tests and checks as possible.

2024-06-02 New Test Version of EDICTOR 3 Published

First test of the new version of EDICTOR 3 is now published in alpha stage. Features for automated alignments, automated cognate detection, and automated inference of correspondence patterns are now all implemented.

User

User Data

Access data through links defined in your configuration file.

{USERDATA}

Files

Open Local Files

Open files in your current folder.

{DATASETS}

Examples

Check out Examplary Datasets

Several datasets have been prepared to illustrate or test editing data with EDICTOR. To check out existing examples, check out the datasets below by clicking on individual examples.

Germanic Wordlist (List 2014)

Bai Wordlist (Wang 2006)

Tujia Wordlist (Starostin 2005)

Help

1 Basic Note on Help

How EDICTOR can be used has been illustrated in various forms, over the last years, since the first version was published in 2017. Although it would be nice to have a complete tutorial that explains all major features available at one point, it seems unrealistic to provided this, also because the creation of such a tutorial would deprive me of the time to develop new features. So for now, it has to be sufficient for users interested in the tool to check out the resources mentioned here, to contact us via GitHub in case of bugs, or to just test the tool and see how it works in action.

Testing EDICTOR in action should be fairly easy, since you cannot break anything if you work with the tool. The recommendation is to click a lot and to check which buttons can be clicked. Left and right mouse click play an important role, so you should make sure to test what happens if you right-mouse-click or left-mouse-click a certain field. If you open panels in EDICTOR and do not know what these panels do, there is always a little question mark on the top right of the panel. If you click this question mark, basic information on the panel will be shared. In this way, it should be possible to make acquaintance with the tool rather quickly.

2 Tutorials

The following tutorials have been published over the last years and shared freely on preprint servers or larger repositories.

List, J.-M. (2017): Historical Language Comparison with LingPy and EDICTOR. Department of Linguistic and Cultural Evolution: Max-Planck Institute for the Science of Human History. URL: https://github.com/digling/edictor-tutorial/raw/master/list-2017-edictor-tutorial.pdf

3 Getting Started with Examples

3.1 Basic File Types

EDICTOR expects TSV files as input. The files should be separated by a tab-stop and contain a header line indicating the content of the individual columns. The first column should provide numerical identifiers of all rows in your data. Each row corresponds to one word. One of the remaining columns should be called DOCULECT and contain the name of the individual languages in your sample. Another column should be called CONCEPT and contain the concepts (or glosses) for individual words. The word form should be provided in segmented form (space being used to segment individual sounds) in a column TOKENS.

Note that EDICTOR accepts some alternative names, you also must not write them in capital letters, but we recommend strongly to adhere to these basic guidelines in order to make sure the tool works properly.

A sample file with just a few lines for inspection can be downloaded here and opened in EDICTOR three by clicking on the box below.

Illustration of File Formats

3.2 Opening Files in EDICTOR

EDICTOR opens two major ways to open a file and manipulate the data. You can open a file by starting from an empty EDICTOR instance, then clicking into the BROWSE FILE field on top left, and selecting the TSV file stored in your system. This will not upload your data to the server, but only make it accessible to the application in the browser on your system. You can test this by downloading this file and then selecting it, when opening a fresh instance.

The alternative way to open files is by passing the filename via the URL. This works, however, only, if you use the local version of EDICTOR that runs with Python on a local server. If you pass filenames to the standard URL (https://edictor.org/edictor.html), only files that have been uploaded to the server in the folder data are accessible. If you use the local EDICTOR version, EDICTOR will first search for files in the current working directory. If files cannot be found, EDICTOR will search in the data directory. Files are passed via the URL by adding the attribute file=filename to the URL.

If you run EDICTOR locally, you can also open SQLite databases. In order to do so, you must make sure to have created a valid SQLITE database of an existing database file (the wordlist command in EDICTOR 3 allows you to export data to the SQLite format required by EDICTOR 3) and place the database into the sqlite folder where your tool is installed. You must also make sure that access to the folder does not require root rights (which can be guaranteed by installing EDICTOR with a local virtual environment). SQLite files are opened from the URL by passing the arguments file and remote_dbase. The file keyword here refers to the name of the table inside the SQLite database and the remote_dbase keyword refers to the name of the table in the SQLite database that contains the data (which is stored in triples).

3.3 Editing Data in the Wordlist Panel

The first panel you see when having successfully opened a file in EDICTOR is the Wordlist panel. This panel offers several possibilities to edit your data, similar in type to a simple spreadsheet editor. If you open a wordlist file, you an directly edit any field in the Wordlist panel, except from the field that shows the identifier (ID) of the row. Since EDICTOR computes internal data representations from the content of the TSV file, however, you should not edit the fields DOCULECT and CONCEPT. Editing is as simple as clicking into a field and then modifying the content. By pressing ENTER the content modification is accepted, pressing ESCAPE will restore the original value. With the arrow keys, you can navigate up and down (equivalent of pressing ENTER, modified content will be accepted), and with CTRL in combination with the left and the right arrow, you can switch between columns. If you want to delete a row or add a new row to your data, you must press on the field of the ID, a new window will open then and ask you for confirmation or provide further instructions.

EDICTOR comes with a rudimentary routine that allows you to segment the data (similar to the functionality in LingPy to segment entries on phonetic transcriptions into their sounds). To trigger this functionality, you must insert an entry into the TOKENS field that is preceded by a space. Spaces to the left or to the right of entries in TOKENS are generally not accepted. Adding a space to the beginning of a phonetic transcription sequence thus informs EDICTOR to segment the data.

3.4 Assigning Words to Cognate Sets

There are several possibilities to assign words to cognate sets. The first and most important decision that you need to make is whether you want to annotate cognates on the level of the words in your data or on the level of morphemes. The former mode is called full cognates in EDICTOR and you can make sure to use this mode by opening the SETTINGS panel and checking the checkbox full for the cognate and colexification mode. The latter mode is called partial cognates in EDICTOR and can be checked out in the same way. You can also specify the mode from the URL, if you open the edictor.html file with the parameters morphology_mode=partial or morphology_mode=full.

In order to edit full or partial cognate sets, you must first make sure that your data contains a column that can store the cognate set identifiers. Traditionally, the column storing full cognate sets is called COGID and the column storing partial cognate sets is called COGIDS. If you have no such column in your data, you can just create one, using by pasting COGID or COGIDS into the "add column" text field in the middle of the application and clicking ENTER. EDICTOR will then create the column (initializing all cells without any values) and also store in the configuration that full or partial cognates can be edited. Clicking on EDIT → COGNATE SETS or EDIT → PARTIAL COGNATE SETS will then open a new panel in which you can start editing your cognates.

3.5 Editing Morpheme Glosses

Morpheme glosses were introduced in order to add semantic information to the individual morphemes of a word in a study by Hill and List (2017). The idea was to supplement the information on partial cognates by individual semantic glosses -- similar to the well-known glosses used in interlinear-glossed text in linguistic typology -- that would help to provide short cuts on the semantic motivation underlying individual words in individual languages. Later, the morpheme gloss panel was added to EDICTOR in order to allow for a convenient annotation and inspection of language-internal and language-external partial cognate sets in a given dataset.

In order to get started with morpheme glosses, you must make sure to have a column for partial cognate sets (default name COGIDS) and a column for morpheme glosses (default name MORPHEMES). You can add these columns easily to your data with the help of the "add columns" textfield mentioned in the section on cognate annotation. Once this has been done, you can open the panel by clicking EDIT → MORPHEME GLOSSES and then clicking on OK. EDICTOR 3 will now search for all partial cognate sets and all individual morphemes and group identical morphemes together, providing one column for the cognate set of each morpheme and one for the morpheme gloss. Having assembled data in this form, you can carry out bulk editing of morphemes and cognate sets in the data, by clicking on the ID field of each morpheme in order to group a given set of individual morphemes. When now editing a single morpheme or cognate set instance, EDICTOR 3 will edit all data that was previously grouped into one group at once.

The file below contains training data on Tujia taken from the Global Lexicostatistical Database (Starostin and Krylof 2011). The dataset contains cognate sets and morpheme glosses which have been annotated already and can be used to test the annotation of morpheme glosses directly.

Edit Morpheme Glosses

3.6 Editing Correspondence Patterns

In order to edit correspondence patterns, you must have created cognate sets and alignments for your data. The typical way to start editing correspondences is by using the automatic approach provided by EDICTOR 3 to compute correspondence patterns, in order to start from there to manually refine existing patterns. Editing patterns requires some testing, but it is essentially also quite straightforward. You start by opening your data and then opening the correspondence pattern panel by clicking EDIT → CORRESPONDENCE PATTERNS. A new panel will open, displaying correspondence patterns for your data. In this panel, you can click on the individual identifiers of a given correspondence pattern and manually modify it (it can only be an integer). This allows you to merge patterns not detected by the algorithm and to separate patterns that have been falsely grouped. Below is again a data example (again from the Tujia example shown in the previous section) where you can test this.

Edit Correspondence Patterns

3.7 Computing Cognates, Alignments, and Correspondences

EDICTOR 3 allows you to compute cognate sets, alignments, and correspondence patterns from your data, both in the web application that is only based on JavaScript and in the local server version running with Python, which integrates LingPy and LingRex. No matter which version you decide to use, if you have emtpy columns for cognate sets (COGID or COGIDS), alignments (ALIGNMENT), and correspondence patterns (PATTERNS), you can carry out full-fledged computer-assisted workflows by starting from automated cognate detection (COMPUTE → COGNATE SETS), then carrying out a phonetic alignment analysis (COMPUTE → ALIGNMENTS), and finally inferring correspondence patterns (COMPUTE → CORRESPONDENCE PATTERNS). If you want to work with partial cognate sets, you must indicate so in the settings, by setting the MORPHOLOGY AND COLEXIFICATION MODE to partial and then pressing the REFRESH button, before you carry out any of the analyses.

You can test the computer-assisted workflow outlined before by opening the file below. You must start by adding the respective columns to the data (COGDI, ALIGNMENT, PATTERNS). Then, you can directly proceed by computing the values interactively. Depending on which version you use -- the JavaScript version or the Python version -- your results will differ slightly.

Test Computer-Assisted Workflow

3.8 Exporting Results to File

To get your data back after having edited them, you must press on the DOWNLOAD button on the top-right of EDICTOR 3. When pressing this button, the current file will be saved in the DOWNLOAD folder (or the default location for downloaded data) on your computer. Note that EDICTOR 3 does not download the data, but that this is the only way that JavaScript can give you your file back. Data has never been updated to any server, but stays on your client system. If you use the version powered by Python, there is an additional button that allows you to save your file directly in the current working directory from which you started the EDICTOR app.

Quit