Resource Grammars for South Africa    🇿🇦

A resource grammar is a computational grammar, engineered using Grammatical Framework (GF), that models the morphology and syntax of a language. The GF runtime allows for sophisticated interaction with such grammars, which includes parsing of natural language text (from text to tree), and linearisation of abstract syntax trees (from tree to text).

This web interface is intended to give a taste of the kind of operations that can be performed by the GF runtime when enabled by resource grammars for the South African languages.

Select a grammar module

Below are options for choosing a language (for now only isiZulu) and a grammar module to use for parsing and linearising.

  • Lang modules deal with full sentences (eg. "umfana ohambayo ungumfundi").
  • Chunk modules can be used for shorter phrases (eg. "umfana ohambayo" or even "ohambayo"). The Chunk modules can also act as a fallback when parsing fails with a Lang module.
There are three lexica to choose from, namely DevLex, MultiLex and MonoLex. MultiLex contains commonly used words mapped to English function names (such as walk_V), while MonoLex contains around 20 000 words mapped to isiZulu function names (such as hamb_V). DevLex similarly contains isiZulu function names, but has only a handful of entries.

From text to tree

The GF runtime, enabled by a GF grammar, makes it possible to parse natural language text to obtain an abstract syntax tree. Enter some text in the box below to see it in action. The tree is displayed visually, along with a Lisp-style text representation below it.

Due to the ambiguous nature of natural language and also the inherent limitations of computational modelling, usually more than one parse tree is associated with a piece of natural language text. The maximum number of parses to be returned by the grammar can be set using the selector below. Parses are weighted and returned in decreasing order - note that this tends to favour smaller trees. Click on the Next and Previous buttons to view all the parses.

The trees displayed here are abstract syntax trees, which represent a tecto-grammatical view of the sentence.

Try a random sentence:


Edit the tree

Once you have a tree (perhaps from having parsed some text), this section allows you to edit its leaf nodes and see the resulting linearisations.

Note that the editing options appear in a depth-first order with respect to the tree, and not in the order that they may apply to the tokens in the linearised text.





Edit the text

Another way to edit trees is by changing the text itself. Note how changes to lexical items affect the morphology of the resulting linearisation. With large lexica, this interface can be somewhat slow.




About

This work by Laurette Marais and Laurette Pretorius, in collaboration with Lionel Posthumus, is being done via the Voice Computing research group at the CSIR, has been funded by SADiLaR and is powered by Grammatical Framework. The latest code of the isiZulu Resource Grammar can be found at github.com/GrammaticalFramework/gf-rgl.