Fine partial matches in source/target

Christopher_Phillips · October 11, 2019, 10:42am

Is there a regex that can warn us for partial matches. We have a number identified a number of users who are copying source to target and not completing the translation. Any ideas of the type of checks we can use to identify partial source matching target? 20 or more characters matchign or something?

e.g.

[source]
The JUKE NISMO RS increases the engine’s performance to 215 horsepower (at 6,000 rpm) and 210 lb-ft of torque (from 3,600 to 4,800 rpm) for the 6-speed manual transmission-equipped front-wheel drive model.

[target]
JUKE NISMO RS کار انجن را به 215 horsepower (به 6,000 rpm) و 210 lb-ft torque (از 3,600 الی 4,800 rpm) افزایش میدهد for the 6-speed manual transmission-equipped front-wheel drive model.

omartin · October 11, 2019, 11:28am

Hi Christopher,

I have tried several regex but couldn’t find any that works. The only way would be to create a plugin that would detect segments that may be partial matches.

Here is the Xbench documentation to create a plugin: https://docs.xbench.net/programmer-reference-qa-plugins/. However, it requires programming skills.

Regards,
Oscar.

pcondal · October 12, 2019, 3:57pm

Since it is relatively likely that these source leftovers will be at the end, probably the following search that captures the last 20 characters in source and checks if they are also found in target should do the trick:

Source: (.{20})=1$
Target: @1$
Search Mode: Regex

You may want to experiment with other lengths than 20 (or have several lengths checked in a checklist and only review the shortest one that does not have too many false positives).

RSchiaffino · October 14, 2019, 4:03pm

Thinking a little outside the box: you might not need a regex at all – all the partially translated segments will most likely contain a fairly high number of words flagged by the spelling checker (at least for languages where a spelling checker is available). By running a QA report that only checks spelling issues, and then sorting it by segment number, you should be able to identify most of the partially untranslated segments.

Raffaele_Pascale · October 28, 2019, 8:46am

Hi pcondal,
Many thanks for your suggestion.
Actually it could work in discovering when the source is half translated and the source string is copied and pasted in the target since this RegEx tries to search exactly the same last 20 characters in source and target. This means that if in the source the last 20 characters are “and this is not translated” it tries to search for the same “and this is not translated” also in the target, but only this doesn’t work since we suppose to run checks on translated sentences. If the sentence is half translated, this means the last 20 characters of the source are completely “empty”, so it’s normal this rule doesn’t give back results. We would need a rule where this searches just if the last 20 characters of the source are translated, without searching for the same source in the target.
Do you think it’s possible?

I hope everything is clear.

Please let me know if any questions.

Thanks!

pcondal · October 28, 2019, 9:24am

Xbench does not understand natural language, so it cannot detect if translation is missing.

So, unless a tag, a number, or some other element is also missing in that omitted part, Xbench will not flag that segment.

Typically omitted text missing can be surfaced with length checks (when source and target vary greatly in length).

Xbench does not have a standard check for differences in length (it is a check with a huge number of false positives), but it can be optionally implemented by developing a QA plugin (requires programming skills in the team).

Here is a sample of a QA plugin that checks for length: https://github.com/xbench/plugin-samples-vs-cpp

Raffaele_Pascale · October 28, 2019, 9:54am

Hi pcondal,

Many thanks for this! It seems it could be useful as well.

Can you just let me know how to install this new plugin in Xbench?

Thanks!

pcondal · October 28, 2019, 10:10am

The QA plugin sample is intended for developers as a sample of what it can be done, so you must engage your developer to tweak it to your requirements and build it.

It must be compiled with Visual Studio C++ compiler and it will generate two DLLs, one for 32 bits and one for 64 bits. You must copy the dll that matches your Xbench (32 bit or 64 bit edition) on your Xbench installation directory, and the plugin will appear among your QA checks.

The Xbench built-in spellchecker is a QA plugin, just like the one in the sample.

Raffaele_Pascale · October 28, 2019, 10:40am

Perfect!

Thanks for the suggestions and info!

Topic		Replies	Views
Regex - In Target but not in source Technical Support	2	606	October 23, 2020
Regex for detecting content in brackets not identical in target Technical Support	10	1144	September 22, 2020
Custom check for full stops at the end not working Technical Support	2	374	November 22, 2022
Search numeric mismatch in certain condition Technical Support	2	913	April 6, 2020
Character limit Technical Support	7	2346	February 19, 2019

Fine partial matches in source/target

Related topics