Is there a regex that can warn us for partial matches. We have a number identified a number of users who are copying source to target and not completing the translation. Any ideas of the type of checks we can use to identify partial source matching target? 20 or more characters matchign or something?
e.g.
[source]
The JUKE NISMO RS increases the engine’s performance to 215 horsepower (at 6,000 rpm) and 210 lb-ft of torque (from 3,600 to 4,800 rpm) for the 6-speed manual transmission-equipped front-wheel drive model.
[target]
JUKE NISMO RS کار انجن را به 215 horsepower (به 6,000 rpm) و 210 lb-ft torque (از 3,600 الی 4,800 rpm) افزایش میدهد for the 6-speed manual transmission-equipped front-wheel drive model.
I have tried several regex but couldn’t find any that works. The only way would be to create a plugin that would detect segments that may be partial matches.
Since it is relatively likely that these source leftovers will be at the end, probably the following search that captures the last 20 characters in source and checks if they are also found in target should do the trick:
Source: (.{20})=1$
Target: @1$
Search Mode: Regex
You may want to experiment with other lengths than 20 (or have several lengths checked in a checklist and only review the shortest one that does not have too many false positives).
Thinking a little outside the box: you might not need a regex at all – all the partially translated segments will most likely contain a fairly high number of words flagged by the spelling checker (at least for languages where a spelling checker is available). By running a QA report that only checks spelling issues, and then sorting it by segment number, you should be able to identify most of the partially untranslated segments.
Hi pcondal,
Many thanks for your suggestion.
Actually it could work in discovering when the source is half translated and the source string is copied and pasted in the target since this RegEx tries to search exactly the same last 20 characters in source and target. This means that if in the source the last 20 characters are “and this is not translated” it tries to search for the same “and this is not translated” also in the target, but only this doesn’t work since we suppose to run checks on translated sentences. If the sentence is half translated, this means the last 20 characters of the source are completely “empty”, so it’s normal this rule doesn’t give back results. We would need a rule where this searches just if the last 20 characters of the source are translated, without searching for the same source in the target.
Do you think it’s possible?
Xbench does not understand natural language, so it cannot detect if translation is missing.
So, unless a tag, a number, or some other element is also missing in that omitted part, Xbench will not flag that segment.
Typically omitted text missing can be surfaced with length checks (when source and target vary greatly in length).
Xbench does not have a standard check for differences in length (it is a check with a huge number of false positives), but it can be optionally implemented by developing a QA plugin (requires programming skills in the team).
The QA plugin sample is intended for developers as a sample of what it can be done, so you must engage your developer to tweak it to your requirements and build it.
It must be compiled with Visual Studio C++ compiler and it will generate two DLLs, one for 32 bits and one for 64 bits. You must copy the dll that matches your Xbench (32 bit or 64 bit edition) on your Xbench installation directory, and the plugin will appear among your QA checks.
The Xbench built-in spellchecker is a QA plugin, just like the one in the sample.