ApSIC Xbench Forum

Checklist rules roundup

Hello everyone,
I was looking for some rules in the forum to add to my checklist in xbench to facilitate my work with PTBR.

I am still missing some important rules and wondered if anyone could help me.
I need to check that there are only straight quotes on the target, that a slash has no spaces around it and that there are spaces around an en-dash only when it is between text, not numbers.

For this last one, I already tested a rule that checks that an en-dash has no spaces between numbers and it works, but I cannot get my head around making it work the other way.

Thanks a lot for your attention and patience, and I sorry for starting off with a huge request.
All the best to everyone.

Hi,

To get segment that contain spaces around an en-dash only when it is between text, run the following search:

Source or Target: [:letter:][:space:]–[:space:][:letter:]
Search mode: Regular Expressions

To get all segments that have a slash with no spaces around it, run the following search:

Source or Target: / ([^[:space:]]/ OR /[^[:space:]])
Search mode: Regular Expressions
PowerSearch: On (press Ctrl+P)

To check that are only straight quotes on target, the easiest way would be to search specific quotation marks except for straight quotes.

This article on the wikipedia specifies which quotation are used in each language

For instance if you to find all segments that contain quotes that are not straight in Catalan, you would run the following search:

Target: "[«»“”“”‘’]"
Search mode: Regular Expressions
PowerSearch: On (press Ctrl+P)

After running a search, click the Add Last Search to Checklist to add that search to a checklist.

I hope that helps.

Oscar.

Hi there Oscar,

Thank you so much for your help. I have been doing some experimenting in my spare time but your regex is way more efficient. I just tested the one for the quotes and it has been working great. I added some more styles just to be safe and I’m going to check the other rules next.

At first I was using [A-Za-z] for [:letters:] and | as OR, but from your syntax I suppose that was not the best way to go.

[:letter:] stands for any of the characters considered alphabetic in any language (Thai, Greek, Spanish, Russian, etc.) while [A-Za-z]does not include accented characters or language-specific characters.

That is very good to know, thanks a lot!
In case anyone is looking for similar rules, I tweaked the one for the slashes so it gets the segments WITH spaces around a slash: / ([[:space:]]/ OR /[[:space:]])
This one is also working very well so far.

Now, looking at the special sets in the documentation I came out with a couple of doubts.
I am not quite certain which set is most comprehensive, [:letterdigit:] or [:alphanum:]?

And how would I go about using [:letter:] but only for capitalized characters?
I was using this [A-Z], but as you taught me this is not the best choice.

Thank you so much for your patience, Oscar.

[:letterdigit:] matches any character considered as alphabetic or numeric in any language.
[:alphanum:]matches any character considered as alphabetic or numeric by the operating system under the current ANSI code page. For instance, if your operating system is in English, [:alphanum:] will not match Greek or Russian characters.

Regarding case-sensitive searches, it is necessary to use [A-Z] for uppercase letters and [a-z] for lowercase, and enable the Case Sensitive checkbox.
If the source or target language is not English but contains latin characters, it is necessary to add language-specific characters to the range of characters.

For instance, if you want to run a case-sensitive search in Spanish, your regular expression would be the following:
Uppercase: [A-ZÁÉÍÓÚÑÏÜ]
Lowercase: [a-záéíóúñïïü]

The following regular expression <[A-ZÁÉÍÓÚÑÏÜ][a-záéíóúñïïü]+>, with Case Sensitive checkbox enabled, will match words such as Caña and Camión but not caña or camión.

Thank you so much for taking the time to explain this, Oscar.
This is really useful!