Hello everyone,
I was looking for some rules in the forum to add to my checklist in xbench to facilitate my work with PTBR.
I am still missing some important rules and wondered if anyone could help me.
I need to check that there are only straight quotes on the target, that a slash has no spaces around it and that there are spaces around an en-dash only when it is between text, not numbers.
For this last one, I already tested a rule that checks that an en-dash has no spaces between numbers and it works, but I cannot get my head around making it work the other way.
Thanks a lot for your attention and patience, and I sorry for starting off with a huge request.
All the best to everyone.
Thank you so much for your help. I have been doing some experimenting in my spare time but your regex is way more efficient. I just tested the one for the quotes and it has been working great. I added some more styles just to be safe and I’m going to check the other rules next.
At first I was using [A-Za-z] for [:letters:] and | as OR, but from your syntax I suppose that was not the best way to go.
[:letter:] stands for any of the characters considered alphabetic in any language (Thai, Greek, Spanish, Russian, etc.) while [A-Za-z]does not include accented characters or language-specific characters.
That is very good to know, thanks a lot!
In case anyone is looking for similar rules, I tweaked the one for the slashes so it gets the segments WITH spaces around a slash: / ([[:space:]]/ OR /[[:space:]])
This one is also working very well so far.
Now, looking at the special sets in the documentation I came out with a couple of doubts.
I am not quite certain which set is most comprehensive, [:letterdigit:] or [:alphanum:]?
And how would I go about using [:letter:] but only for capitalized characters?
I was using this [A-Z], but as you taught me this is not the best choice.
[:letterdigit:] matches any character considered as alphabetic or numeric in any language. [:alphanum:]matches any character considered as alphabetic or numeric by the operating system under the current ANSI code page. For instance, if your operating system is in English, [:alphanum:] will not match Greek or Russian characters.
Regarding case-sensitive searches, it is necessary to use [A-Z] for uppercase letters and [a-z] for lowercase, and enable the Case Sensitive checkbox.
If the source or target language is not English but contains latin characters, it is necessary to add language-specific characters to the range of characters.
For instance, if you want to run a case-sensitive search in Spanish, your regular expression would be the following:
Uppercase: [A-ZÁÉÍÓÚÑÏÜ]
Lowercase: [a-záéíóúñïïü]
The following regular expression <[A-ZÁÉÍÓÚÑÏÜ][a-záéíóúñïïü]+>, with Case Sensitive checkbox enabled, will match words such as Caña and Camión but not caña or camión.