Regex for detecting content in brackets not identical in target

LuisHermogenes · September 18, 2020, 9:05am

Hi!

This is a bit related to a topic posted in 2018: Regex for target equals source in brackets

In there, the question was relating to how to create a check to detect when the content in brackets in the source is not identical in the target. For instance:

Source: Bill by Default [Label][Consultants].
Target: Cobrar por padrão [NotLabel][NotConsultants]

Xbench would detect this automatically as an error.

However, I am running checks with the solution posted there and I’m not getting exactly what I’m looking for. For instance, if there is a case in which only one of the brackets is correct, but the rest is incorrect, then Xbench considers it to be all correct.

I wanted to ask: Is there a way to create a check with Regex that detects when at least one of the target brackets is different from the ones in the source?

E.g.,
Source: Dear [First Name] [Last Name], you’ve won [PRIZE].
Target: Estimado/a [First Name] [Last Name], has ganado [UN PREMIO].

I would like Xbench to detect [UN PREMIO], even though the other brackets are correct.

Thanks in advance!

omartin · September 18, 2020, 9:47am

Hi Luis,

I think the 2 following entries would suit your needs:

Source: -@1
Target: "^.*\[[^\]]+\].*(\[[^\]]+\])=1.*"

Search mode: Regular Expressions
PowerSearch: On.

Source: -@1
Target: ".*(\[[^\]]+\])=1"

Search mode: Regular Expressions
PowerSearch: On.

Best regards,
Oscar.

LuisHermogenes · September 18, 2020, 10:30am

Hi Oscar,

Thanks for the quick reply! However, there is still some small issues, can you help?

For the first expression, it doesn’t detect when the source and target have only one content in brackets:
E.g.:

Dear [First Name]
Estimado(a) [Nombre]

And when when there is at least one correct bracket:
E.g.:

Dear [First Name] [Last Name], you’ve won [PRIZE]
Target: Estimado(a) [First Name] [Last Name], ganaste [PREMIO]

For the second, it doesn’t detect when there is at least one correct bracket:

E.g.:

Dear [First Name] [Last Name], you’ve won [PRIZE]
Estimado(a) [First Name] [Last Name], ganaste [PREMIO]

I’m sorry to keep bugging you, but can you help? Thanks!

omartin · September 18, 2020, 11:46am

In this case, the best approach would be to create a checklist for the number of text in brackets missing in target:

For instance, to check only segments with one text in brackets, use the following search:

Source: "(\[[^\]]+\])=1"
Target: -@1
Search mode: Regular Expressions
PowerSearch: On.

For instance, to check only segments with two texts in brackets, use the following search:
Source: "(\[[^\]]+\])=1.+(\[[^\]]+\])=2"
Target: -@1 OR -@2
Search mode: Regular Expressions
PowerSearch: On.

For instance, to check only segments with three text in brackets, use the following search:
Source: "(\[[^\]]+\])=1.+(\[[^\]]+\])=2.+(\[[^\]]+\])=3"
Target: -@1 OR -@2 OR -@3
Search mode: Regular Expressions
PowerSearch: On.

In segments detected source will be highlighted whereas the target text won’t. This means that at least one of the text in brackets is missing in source.

You can define up to 9 variables. You can find more information on regex and variables at https://docs.xbench.net/user-guide/regular-expressions/.

LuisHermogenes · September 18, 2020, 12:05pm

Hi Oscar,

Mmm, I don’t know if it’s related to the amount of brackets for this expression:

Source: -@1
Target: ".*(\[[^\]]+\])=1

See the below table.
It detects 1, 2, 3, 4, 6, and 8, but not 5 and 7.

Source	Target
Dear [First Name] [Last Name]	Estimado(a) [Nombre] [Apellido]	Incorrect 1
Dear [First Name]	Estimado(a) [Nombre]	Incorrect 2
Dear [First Name] [Last Name], you’ve won [PRIZE]	Estimado(a) [Nombre] [Apellido], ganaste [PREMIO]	Incorrect 3
Dear [First Name] [Last Name]	Estimado(a) [First Name] [Apellido]	Incorrect 4
Dear [First Name] [Last Name], you’ve won [PRIZE]	Estimado(a) [Nombre] [Apellido], ganaste [PRIZE]	Incorrect 5
Dear [First Name] [Last Name], you’ve won [PRIZE]	Estimado(a) [First Name] [Last Name], ganaste [PREMIO]	Incorrect 6
Dear [First Name] [Last Name], you’ve won [PRIZE], collect [HERE]	Estimado(a) [Nombre] [Apellido], ganaste [PRIZE]. Recógelo [HERE]	Incorrect 7
Dear [First Name] [Last Name], you’ve won [PRIZE], collect [HERE]	Estimado(a) [Nombre] [Apellido], ganaste [PRIZE]. Recógelo [AQUí]	Incorrect 8

So it’s not about the amount of brackets, it seems:

1 has 2 tags, 2 of which are wrong.
2 has 1, 1 wrong.
3 has 3, 3 wrong.
4 has 2, 1 wrong.
5 has 3, 2 wrong.
6 has 3, 1 wrong.
7 has 4, 2 wrong.
8 has 4, 3 wrong.

Any ideas?

pcondal · September 19, 2020, 9:09am

If this is a one-off need, one quick and dirty trick could be the following:

In Xbench Export the project items to tab-delimited with Tools->Export Items.
With a text editor, do a global replace of [ with < and a global replace of ] with >.
Load the modified text file in Xbench as ongoing translation.
Run a QA of Tag Mismatches.

Otherwise you would need several checklist items, which is more elaborate and might miss some edge cases: one checklist item for the case of segments with one placeholder, one checklist item for the case of segments with two placeholders, etc.

LuisHermogenes · September 21, 2020, 7:54am

Sadly, it’s not a one-off thing. I am creating a checklist to automate checks for a large amount of content and that additional step would be too time-consuming.

Also, creating a checklist based on the number of brackets in source has some issues, as the one for 1 bracket catches some, the one for 2 catches those and others, and so on, so there would be a lot of repetitions.

I was wondering how is it the tag mismatch works so well with angle brackets, but using square brackets causes so many issues. I assume the Tag Mismatch check (the regular one in the list of checks) does not use Regex, correct?

If it does use Regex, maybe we could adapt the expression for that one check to use square brackets?

omartin · September 21, 2020, 3:54pm

Replacing [] with <> would allow to use the Tag mismatch feature to detect all those items that have been modified or are missing in target.

With this workaround, it should not be necessary to create checklists entries.

LuisHermogenes · September 21, 2020, 4:15pm

Hi Oscar,

I see, but that is not an option, as it would cause much more work than it saves.

Thanks anyway for your help!

omartin · September 22, 2020, 3:18pm

You can replace all instances of text between [] with <>.

Open files in Notepad++.
Press Ctr+H to find and replace.
At the Replace window, select Regular Expressions.
Find what: \[([^\]]+)\]
Replace with: <\1>
Click Replace All.

If you have more than one file, open all files and click Replace all in all opened document files.

Check tag mismatches in Xbench. All mismatches will be reported.

LuisHermogenes · September 22, 2020, 3:30pm

Hi Oscar,

Great! Many thanks for your great help. We will try this approach. Thanks!

Topic		Replies	Views
Regex for target equals source in brackets Technical Support	4	3311	December 3, 2018
Detect different source and target Technical Support	2	779	April 11, 2019
How to detect target inconsistency for highly similar source strings Technical Support	0	230	November 30, 2023
Find mismatching occurrences between source and target of a given word/expression Technical Support	2	1892	December 22, 2016
Regex to detect domain name mismatches Technical Support	6	1024	October 11, 2021

Regex for detecting content in brackets not identical in target

Related topics