Find mismatching occurrences between source and target of a given word/expression

Hello everyone

Is there any way to find all translation units in a project with mismatching occurrences of a given word/expression between source and target?

For instance, if we want to find all strings with a ™ symbol present in source but missing in target, the search would simply be:

Source: ™
Target: -™
PowerSearch: ON

Yet, my interest now is to find all strings with, for instance, two ™ symbols in source and only one or more than two ™ symbols in target.

Obviously, the previous search is not valid for this purpose, since it will “just” look for any target without any ™ symbol at all. I have tried unsuccessfully with several searches combining the [^™] expression with other expressions, like the following:

Source: ™.+<[:alpha:]+™>
Target: ™.+<[:alpha:]+[^™]>*$
RegEx: ON

The idea was to find all occurrences of source strings with two ™ symbols (a “™” followed by any set of characters and a word ending in ™) and any string in target with a ™ symbol but no ™ thereafter until the end of the string. I know I am getting it wrong, but I thought that using the [^set-expression]*$ expression, Xbench would exclude all instances of “™” not followed by any other “™” until the end of the string. Obviously, it doesn’t.

Moreover, source expression is too greedy and also returns strings with more than two occurrences of ™.

Any idea about how to tell Xbench to search only a given number of occurrences of a word/expression within a string? Probably it is easier than I think, but I am a bit blocked with it, and any help would be much appreciated.

Many thanks in advance

Xbench regular expressions are greedy. They will find the longest possible string.

If you want to find strings that contain two ™ symbols in source but not in target, try the following search:

Source: "^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]*$"
Target: -"^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]*$"
Search Mode: Regular Expressions
PowerSearch: ON

If you need to find a different number of symbols, you will have to enter a new search.

For instance, if you want to find all segments that contain 3 symbols in source but not in target, you should run the following search:

Source: "^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]+[:alpha:]™>[^™]*$"
Target: -"^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]+[:alpha:]™>[^™]*$"
Search Mode: Regular Expressions
PowerSearch: ON

1 Like

Million thanks for your help, Òscar!

I still haven’t tried it, but I will let you know if I encounter any issue.

Have a nice day!