RegEx Testing Platform

I have a question regarding RegEx: we know Xbench uses a different RegEx pattern, the POSIX pattern (instead of ASCII) and we know this pattern is not generally used for similar translations tools, just like Trados Studio.

Whenever I have to create a RegEx on Trados Studio, I perform a test in a online platform before implement the rules on the software.

However, I wasn’t able to find a online platform to test the RegEx rules for Xbench. I’d like to make sure my rules on Checklist Manager are going to work fine, but to do it I have to create a dummy file (in .doc format), put it on XLIFF format and then open this file on Xbench just to see if my rules are working or not.

Can you guys help me out there? Do you guys have a way to test if rules are working before implementing them on the Checklist Manager in Xbench?

We basically use these three workflows (or slight variants of it) when creating new rules.

To compose a rule

If we want to compose a new rule, we create a .TXT document on the Desktop, put there a couple or three of the instances as sourcetarget. We then save the .TXT file, right-click on it and choose Run QA in Xbench to launch an Xbench project with just that file.

Then we use the Search tab to compose the search. Once the search does what we want, we click Add last search to checklist.

To test the effectiveness of a rule:

We have a large existing Xbench project with many items (in the range of hundreds of thousands of segments). Once we have added the checklist item to the checklist, we go to the Checklist Manager, select the entry, right-click and then choose Test. It allows us to see how the rule behaves with a larger data set.

To test the performance of a rule

With the large existing Xbench project, we run a QA with the full checklist where the new entry is found. Then we go back to Checklist Manager and look at the Run time (ms) column to see how the new entry compares to existing entries.

2 Likes

I just have to thank you a lot.

I’ve seen a great time saving in creating new Regular Expressions with the method you provided for now.

The only thing I didn’t get was the “Run time (ms)” thing. What does it mean?

The Run Time (ms) column in Checklist Manager shows the execution time in milliseconds for each checklist item (larger is worse). To populate this column for a checklist you just have to run a QA pass with Check Ongoing Translation that uses such checklist. For more accurate results, better use a large corpora.

When regular expressions are involved, it allows you to find where your performance bottlenecks are found and see if there is something that can be done about it.

For example let’s consider a search for segments that contain text arbitrary_text and at least one digit.

With the Run Time (ms) column could see that

  • Source: "arbitratry_text" "<[0-9]+[a-z]+>"
  • Search Mode: Regular Expressions
  • PowerSearch: ON

is faster than

  • Source: "<[0-9]+[a-z]+>" "arbitratry_text"
  • Search Mode: Regular Expressions
  • PowerSearch: ON

because in the first case non-matching segments are discarded at a much faster rate than the second.