Check number formatting

pcs · February 4, 2018, 2:49pm

Good day, I am new on this forum but I have been using XBench for two years now for QA. I need help with RegEx to fix my TM.

That’s the story…

In 2016, a long-standing client introduced a style guide for numbers/measures used in their manuals. Before then, they were using the AmE number ‘spelling’, i.e. 34,000 for 34 thousands, 12.15 for 12 units and 15 decimal points. In 2016 they decided to match some technical standard, for which the thousands separator should be a non-breaking space (i.e. 34 000) and the decimal separator should still be the point (12.15). They decided that also the translated manuals should stick to this rule, regardless of the local custom – for example, in my country we use the comma as the decimal separator, but I should stick to the point (no pun intended) for this client.

After 5 years, my TM has a mix of bad sources and bad targets due to these style changes. I would like to fix my TM so that anything I pre-translate is pre-translating according to the new style guide.

I am working with SDL Trados Studio and thus I have exported my TM from *.sdltm to *.tmx.
I have loaded the *.tmx in XBench.
I need to check and edit those TUs that do not match the current style guide. I.e. I need to run a series of check on the target, according to these rules:

If the number has 5 or more digits, the thousand separator has to be a non-breaking space (i.e. 34 000, however if it has just 4 digits, no space should be used, i.e. 4000)
If the number has decimals, the decimal separator should be the point and not the comma (i.e. 12.15 is okay, 12,15 is not)
If the source contains in. the target should contain in. (with the point) as well.
If the source contains cu.ft. the target should contain cu.ft. (with the points) as well.
If the source contains L the target should contain L (single letter, capitalized) as well.
Ranges should be indicated using a en dash, so 10-20 is okay, while 10-20 is not okay. (in this post they are displayed the same, but the en dash should look slightly longer than the regular dash)
If the source contains red rose the target should contain rosa rossa.
If the target contains km, it should be preceded by a non breaking space.
If the target contains ASTM, it should be followed by a non breaking space

Can you please help me translate this into RegEx? Thank you in advance!

pcondal · February 5, 2018, 6:27pm

The Xbench RegEx grammar is documented here.

Xbench 3.0 also support the .sdltm format as input format.

Here shows how you can approach them (or at least an approximation that could be a good starting point if I did not understand the exact requirements well).

You can probably try to find if there are still entries of the wrong target thousands separator (I assume it was a comma in target):

Target: [0-9]{2,3},[0-9]{3}
Mode: Regular Expressions

For the 4 digit number, you can perhaps check if there is a 4 digit number with an old thousands separator:

Target: <[0-9]{1},[0-9]{3}>
Mode: Regular Expressions

This one needs to rely on the source and use PowerSearch. I’m assuming that source uses decimal point and that target also should use decimal point:

Source: "([0-9]+\.[0-9]+)=1"
Target: -@1
Mode: Regular Expressions
PowerSearch: On

Source: "<in\."
Target: -"<in\."
Mode: Regular Expressions
PowerSearch: On

Source: "<cu\.ft\."
Target: -"<cu\.ft\."
Mode: Regular Expressions
PowerSearch: On

Source: "<L>"
Target: -"<L>"
Mode: Regular Expressions
PowerSearch: On
Case-sensitive:: On

Target: <[0-9]+-[0-9]+> (use the bad dash here)
Mode: Regular Expressions

Source: "red rose"
Target: -"rosa rossa"
Mode: Simple
PowerSearch: On

Target: [^\x00a0]km>
Mode: Regular Expressions

Target: ASTM[^\x00a0]
Mode: Regular Expressions

pcs · February 6, 2018, 3:19am

Hello @pcondal,

thank you so much for your help. You are a life saver!
I will reference the Xbench RegEx grammar in the future. RegEx is not so straightforward and it looks a little intimidating, though well worth learning!

As per your suggestion, I have been searching directly the *.sdltm file, though to make edits, SDL Studio is open/used. Correct? I thought it was possible to batch/bulk edit in Xbench directly, but maybe it is just for other file formats.

Anyway, most of the strings worked , however these were not working:

This one is not working (the search fields turn red), however the previous one searching for the abbreviation of inches, works.

This one is not working (the search fields turn red).

Thank you again for your time and efforts!

pcondal · February 6, 2018, 8:19am

Xbench is a browser, not really an editor. We always try to find ways to call the home application of the format to ensure data integrity (slight changes for example in a home application update can easily corrupt data for the home application proprietary data).

However, I agree that it would be great that it was possible from Xbench to open the TM in Studio right at the segment.

I created this idea in the SDL Community site. If you vote it (and manage that other interested users vote it), and SDL eventually decides to implement it, the functionality of segment positioning will be eventually available in Xbench.

For the cu.ft. I notice you are missing the backslashes in front of the dot (a dot has an special meaning in Regex). In any case, could you provide specific source/target examples on where does it not work?

pcs · February 7, 2018, 12:28am

I am still getting the same error. I believe it is a syntax error, but I cannot tell where the error lies.

pcondal · February 7, 2018, 7:05am

Red background in Xbench means “nothing found” in search.

Are you pressing Ctrl+P to search in PowerSearch mode?

Is there any segment that has “something cu.ft.” in source and does not have “something cu.ft.” in target?

Claudia_Cappelletti · February 7, 2018, 6:15pm

In this case:

Source: “<cu.ft.”

Target: -"<cu.ft."

Mode: Regular Expressions

PowerSearch: On

I believe you need to “escape” the dots so as to tell xbench that this dot is not a regular expression. So you should insert a backslash right before each dot.

RSchiaffino · February 7, 2018, 7:11pm

"However, I agree that it would be great that it was possible from Xbench to open the TM in Studio right at the segment.

I created this idea3 in the SDL Community site. If you vote it (and manage that other interested users vote it), and SDL eventually decides to implement it, the functionality of segment positioning will be eventually available in Xbench."

As an alternative, while this is still not possible with sdltm memories, it should be possible to access the “offending segment” of a TM for editing by first exporting the TM to TMX format, then loading it in Xbench as “ongoing translation”.

pcs · February 9, 2018, 6:50pm

You are right, by now I probably have solved all issues with this specific string. I will be running more QA checks later next week and I might post again! Thank you!

Topic		Replies	Views
Numeric Mismatch and country-specific formats General Discussion	4	1736	May 5, 2020
QA mismatching segments Technical Support	1	760	April 23, 2019
Numeric formatting and punctuation Technical Support	1	457	August 19, 2022
Remove tag codes from Numeric Mismatch Technical Support	6	3044	June 30, 2017
Regex with word boundary and anchors inside groups Technical Support	3	897	March 28, 2021

Check number formatting

Related topics