Regex to catch missing glossary translations

VL2D · September 28, 2023, 11:57am

Hello Xbench team,

Thank you for the great QA tool! It is really great.

Could you please help me to compile a regex query? It is intended to catch incorrectly used terminology in English-Ukrainian translations I work in. I know about Key Terms function, but it catches issues only in the specified word forms, while Ukrainian is an inflectional language, and a Ukrainian noun can theoretically have up to 14 different word forms (usually less, as some of them can coincide).

For example, I have an approved glossary term, say:
EN: table
UK: таблиця

Specifically, the English word “table” can have 4 word forms:
EN: table, tables, table’s, tables’

The Ukrainian “таблиця” can have the following word forms:
UK: таблиця, таблиці, таблицю, таблицею, таблиць, таблицям, таблицями, таблицях, таблице

To catch possible mistranslations, I would like to create the regex query which catches segments where the source contains any of “table” word forms, but contains none of “таблиця” word forms. Is that possible with Xbench regex engine?

Thank you in advance!

omartin · October 4, 2023, 7:36am

The following search should work:
Source: tabl(es?)
Target: -таблиц(я|і|ю|ею|ь|ям|ями|ях|е)

Search mode: Regular expressions
Powersearch and Match Whole Word: enabled.

A simpler way to search would be to enter the stem of the word in Russian:
Source: tabl(es?)
Target: -таблиц[^[:space:]]{,3}
Search mode: Regular expressions
Powersearch and Match Whole Word: enabled.

VL2D · October 4, 2023, 7:59am

Thank you for your suggestion. I tried similar queries, but it seems that Power Search does not work when a query contains round brackets. Xbench returns the following on both cases:

ApSIC Xbench

Source term error: Expected character: “)”.

OK

If I change tabl(es?) to table|tables|table’s|tables’, the version with -таблиц[^[1]]{,3} does work - probably because it does not contain round brackets, so in this case it helps. But it is not always linguistically possible to compile a stem for all possible Ukrainian word forms, that is why I look for a way to type them all directly.

:space: ↩︎

Topic		Replies	Views
Question about terminology Technical Support	1	474	January 26, 2021
XBench cannot detect mistakes in the target text if there are multiple source terms with at least one correct target term present Technical Support	4	989	November 2, 2020
Xbench's Regex Matching Limitations Technical Support	5	144	July 31, 2024
Exclusion search with XBench? Technical Support	6	2020	October 18, 2018
Find mismatching occurrences between source and target of a given word/expression Technical Support	2	1892	December 22, 2016

Regex to catch missing glossary translations

ApSIC Xbench

Source term error: Expected character: “)”.

OK

Related topics