Regex to catch missing glossary translations

Hello Xbench team,

Thank you for the great QA tool! It is really great.

Could you please help me to compile a regex query? It is intended to catch incorrectly used terminology in English-Ukrainian translations I work in. I know about Key Terms function, but it catches issues only in the specified word forms, while Ukrainian is an inflectional language, and a Ukrainian noun can theoretically have up to 14 different word forms (usually less, as some of them can coincide).

For example, I have an approved glossary term, say:
EN: table
UK: таблиця

Specifically, the English word “table” can have 4 word forms:
EN: table, tables, table’s, tables’

The Ukrainian “таблиця” can have the following word forms:
UK: таблиця, таблиці, таблицю, таблицею, таблиць, таблицям, таблицями, таблицях, таблице

To catch possible mistranslations, I would like to create the regex query which catches segments where the source contains any of “table” word forms, but contains none of “таблиця” word forms. Is that possible with Xbench regex engine?

Thank you in advance!

The following search should work:
Source: tabl(es?)
Target: -таблиц(я|і|ю|ею|ь|ям|ями|ях|е)

Search mode: Regular expressions
Powersearch and Match Whole Word: enabled.

A simpler way to search would be to enter the stem of the word in Russian:
Source: tabl(es?)
Target: -таблиц[^[:space:]]{,3}
Search mode: Regular expressions
Powersearch and Match Whole Word: enabled.

Thank you for your suggestion. I tried similar queries, but it seems that Power Search does not work when a query contains round brackets. Xbench returns the following on both cases:


ApSIC Xbench

Source term error: Expected character: “)”.

OK

If I change tabl(es?) to table|tables|table’s|tables’, the version with -таблиц[[1]]{,3} does work - probably because it does not contain round brackets, so in this case it helps. But it is not always linguistically possible to compile a stem for all possible Ukrainian word forms, that is why I look for a way to type them all directly.


  1. :space: ↩︎