ApSIC Xbench Forum

Regex for detecting content in brackets not identical in target

Hi!

This is a bit related to a topic posted in 2018: Regex for target equals source in brackets

In there, the question was relating to how to create a check to detect when the content in brackets in the source is not identical in the target. For instance:

Source: Bill by Default [Label][Consultants].
Target: Cobrar por padrão [NotLabel][NotConsultants]

Xbench would detect this automatically as an error.

However, I am running checks with the solution posted there and I’m not getting exactly what I’m looking for. For instance, if there is a case in which only one of the brackets is correct, but the rest is incorrect, then Xbench considers it to be all correct.

I wanted to ask: Is there a way to create a check with Regex that detects when at least one of the target brackets is different from the ones in the source?

E.g.,
Source: Dear [First Name] [Last Name], you’ve won [PRIZE].
Target: Estimado/a [First Name] [Last Name], has ganado [UN PREMIO].

I would like Xbench to detect [UN PREMIO], even though the other brackets are correct.

Thanks in advance!

Hi Luis,

I think the 2 following entries would suit your needs:

Source: -@1
Target: "^.*\[[^\]]+\].*(\[[^\]]+\])=1.*"

Search mode: Regular Expressions
PowerSearch: On.

Source: -@1
Target: ".*(\[[^\]]+\])=1"

Search mode: Regular Expressions
PowerSearch: On.

Best regards,
Oscar.

Hi Oscar,

Thanks for the quick reply! However, there is still some small issues, can you help?

  • For the first expression, it doesn’t detect when the source and target have only one content in brackets:
    E.g.:

Dear [First Name]
Estimado(a) [Nombre]

And when when there is at least one correct bracket:
E.g.:

Dear [First Name] [Last Name], you’ve won [PRIZE]
Target: Estimado(a) [First Name] [Last Name], ganaste [PREMIO]

  • For the second, it doesn’t detect when there is at least one correct bracket:

E.g.:

Dear [First Name] [Last Name], you’ve won [PRIZE]
Estimado(a) [First Name] [Last Name], ganaste [PREMIO]

I’m sorry to keep bugging you, but can you help? Thanks!

In this case, the best approach would be to create a checklist for the number of text in brackets missing in target:

For instance, to check only segments with one text in brackets, use the following search:

Source: "(\[[^\]]+\])=1"
Target: -@1
Search mode: Regular Expressions
PowerSearch: On.

For instance, to check only segments with two texts in brackets, use the following search:
Source: "(\[[^\]]+\])=1.+(\[[^\]]+\])=2"
Target: -@1 OR -@2
Search mode: Regular Expressions
PowerSearch: On.

For instance, to check only segments with three text in brackets, use the following search:
Source: "(\[[^\]]+\])=1.+(\[[^\]]+\])=2.+(\[[^\]]+\])=3"
Target: -@1 OR -@2 OR -@3
Search mode: Regular Expressions
PowerSearch: On.

In segments detected source will be highlighted whereas the target text won’t. This means that at least one of the text in brackets is missing in source.

You can define up to 9 variables. You can find more information on regex and variables at https://docs.xbench.net/user-guide/regular-expressions/.

Hi Oscar,

Mmm, I don’t know if it’s related to the amount of brackets for this expression:

Source: -@1
Target: ".*(\[[^\]]+\])=1

See the below table.
It detects 1, 2, 3, 4, 6, and 8, but not 5 and 7.

Source Target
Dear [First Name] [Last Name] Estimado(a) [Nombre] [Apellido] Incorrect 1
Dear [First Name] Estimado(a) [Nombre] Incorrect 2
Dear [First Name] [Last Name], you’ve won [PRIZE] Estimado(a) [Nombre] [Apellido], ganaste [PREMIO] Incorrect 3
Dear [First Name] [Last Name] Estimado(a) [First Name] [Apellido] Incorrect 4
Dear [First Name] [Last Name], you’ve won [PRIZE] Estimado(a) [Nombre] [Apellido], ganaste [PRIZE] Incorrect 5
Dear [First Name] [Last Name], you’ve won [PRIZE] Estimado(a) [First Name] [Last Name], ganaste [PREMIO] Incorrect 6
Dear [First Name] [Last Name], you’ve won [PRIZE], collect [HERE] Estimado(a) [Nombre] [Apellido], ganaste [PRIZE]. Recógelo [HERE] Incorrect 7
Dear [First Name] [Last Name], you’ve won [PRIZE], collect [HERE] Estimado(a) [Nombre] [Apellido], ganaste [PRIZE]. Recógelo [AQUí] Incorrect 8

So it’s not about the amount of brackets, it seems:

1 has 2 tags, 2 of which are wrong.
2 has 1, 1 wrong.
3 has 3, 3 wrong.
4 has 2, 1 wrong.
5 has 3, 2 wrong.
6 has 3, 1 wrong.
7 has 4, 2 wrong.
8 has 4, 3 wrong.

Any ideas?

If this is a one-off need, one quick and dirty trick could be the following:

  1. In Xbench Export the project items to tab-delimited with Tools->Export Items.
  2. With a text editor, do a global replace of [ with < and a global replace of ] with >.
  3. Load the modified text file in Xbench as ongoing translation.
  4. Run a QA of Tag Mismatches.

Otherwise you would need several checklist items, which is more elaborate and might miss some edge cases: one checklist item for the case of segments with one placeholder, one checklist item for the case of segments with two placeholders, etc.

Sadly, it’s not a one-off thing. I am creating a checklist to automate checks for a large amount of content and that additional step would be too time-consuming.

Also, creating a checklist based on the number of brackets in source has some issues, as the one for 1 bracket catches some, the one for 2 catches those and others, and so on, so there would be a lot of repetitions.

I was wondering how is it the tag mismatch works so well with angle brackets, but using square brackets causes so many issues. I assume the Tag Mismatch check (the regular one in the list of checks) does not use Regex, correct?

image

If it does use Regex, maybe we could adapt the expression for that one check to use square brackets?

Replacing [] with <> would allow to use the Tag mismatch feature to detect all those items that have been modified or are missing in target.

With this workaround, it should not be necessary to create checklists entries.

Hi Oscar,

I see, but that is not an option, as it would cause much more work than it saves.

Thanks anyway for your help!

You can replace all instances of text between [] with <>.

  1. Open files in Notepad++.
  2. Press Ctr+H to find and replace.
  3. At the Replace window, select Regular Expressions.
  4. Find what: \[([^\]]+)\]
  5. Replace with: <\1>
  6. Click Replace All.

If you have more than one file, open all files and click Replace all in all opened document files.

Check tag mismatches in Xbench. All mismatches will be reported.

Hi Oscar,

Great! Many thanks for your great help. We will try this approach. Thanks!