Find mismatching occurrences between source and target of a given word/expression

Manuel · December 19, 2016, 11:34pm

Hello everyone

Is there any way to find all translation units in a project with mismatching occurrences of a given word/expression between source and target?

For instance, if we want to find all strings with a ™ symbol present in source but missing in target, the search would simply be:

Source: ™
Target: -™
PowerSearch: ON

Yet, my interest now is to find all strings with, for instance, two ™ symbols in source and only one or more than two ™ symbols in target.

Obviously, the previous search is not valid for this purpose, since it will “just” look for any target without any ™ symbol at all. I have tried unsuccessfully with several searches combining the [^™] expression with other expressions, like the following:

Source: ™.+<[:alpha:]+™>
Target: ™.+<[:alpha:]+[^™]>*$
RegEx: ON

The idea was to find all occurrences of source strings with two ™ symbols (a “™” followed by any set of characters and a word ending in ™) and any string in target with a ™ symbol but no ™ thereafter until the end of the string. I know I am getting it wrong, but I thought that using the [^set-expression]*$ expression, Xbench would exclude all instances of “™” not followed by any other “™” until the end of the string. Obviously, it doesn’t.

Moreover, source expression is too greedy and also returns strings with more than two occurrences of ™.

Any idea about how to tell Xbench to search only a given number of occurrences of a word/expression within a string? Probably it is easier than I think, but I am a bit blocked with it, and any help would be much appreciated.

Many thanks in advance

omartin · December 20, 2016, 8:39am

Xbench regular expressions are greedy. They will find the longest possible string.

If you want to find strings that contain two ™ symbols in source but not in target, try the following search:

Source: "^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]*$"
Target: -"^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]*$"
Search Mode: Regular Expressions
PowerSearch: ON

If you need to find a different number of symbols, you will have to enter a new search.

For instance, if you want to find all segments that contain 3 symbols in source but not in target, you should run the following search:

Source: "^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]+[:alpha:]™>[^™]*$"
Target: -"^[^™]+[:alpha:]™[^™]+[:alpha:]™>[^™]+[:alpha:]™>[^™]*$"
Search Mode: Regular Expressions
PowerSearch: ON

Manuel · December 22, 2016, 7:05am

Million thanks for your help, Òscar!

I still haven’t tried it, but I will let you know if I encounter any issue.

Have a nice day!

Topic		Replies	Views
Regular Expression in Project Checklist: Is there any way to find ALL different occurrences between source variable and target variable? General Discussion	4	2791	August 5, 2018
Checking for symbols Technical Support	3	876	January 25, 2022
Regular expression to find mismatch between source and target for variable type of strings General Discussion	2	1974	May 3, 2017
Regex for target equals source in brackets Technical Support	4	3127	December 3, 2018
Regex - In Target but not in source Technical Support	2	542	October 23, 2020

Find mismatching occurrences between source and target of a given word/expression

Related Topics