Find mismatching occurrences between source and target of a given word/expression


#1

Hello everyone

Is there any way to find all translation units in a project with mismatching occurrences of a given word/expression between source and target?

For instance, if we want to find all strings with a ™ symbol present in source but missing in target, the search would simply be:

Source: ™
Target: -™
PowerSearch: ON

Yet, my interest now is to find all strings with, for instance, two ™ symbols in source and only one or more than two ™ symbols in target.

Obviously, the previous search is not valid for this purpose, since it will “just” look for any target without any ™ symbol at all. I have tried unsuccessfully with several searches combining the [^™] expression with other expressions, like the following:

Source: ™.+<[:alpha:]+™>
Target: ™.+<[:alpha:]+[^™]>*$
RegEx: ON

The idea was to find all occurrences of source strings with two ™ symbols (a “™” followed by any set of characters and a word ending in ™) and any string in target with a ™ symbol but no ™ thereafter until the end of the string. I know I am getting it wrong, but I thought that using the [^set-expression]*$ expression, Xbench would exclude all instances of “™” not followed by any other “™” until the end of the string. Obviously, it doesn’t.

Moreover, source expression is too greedy and also returns strings with more than two occurrences of ™.

Any idea about how to tell Xbench to search only a given number of occurrences of a word/expression within a string? Probably it is easier than I think, but I am a bit blocked with it, and any help would be much appreciated.

Many thanks in advance


#2

Xbench regular expressions are greedy. They will find the longest possible string.

If you want to find strings that contain two ™ symbols in source but not in target, try the following search:

Source: "^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]*$"
Target: -"^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]*$"
Search Mode: Regular Expressions
PowerSearch: ON

If you need to find a different number of symbols, you will have to enter a new search.

For instance, if you want to find all segments that contain 3 symbols in source but not in target, you should run the following search:

Source: "^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]+[:alpha:]™>[^™]*$"
Target: -"^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]+[:alpha:]™>[^™]*$"
Search Mode: Regular Expressions
PowerSearch: ON


#3

Million thanks for your help, Òscar!

I still haven’t tried it, but I will let you know if I encounter any issue.

Have a nice day!