ApSIC Xbench Forum

Regex to detect domain name mismatches

Hi all,

I am trying to create a regular expression that would detect an exact mismatch of a domain name between source and target.

For instance, I have “sitename.com”, “sitename.de”, “sitename.com.ca”, and “sitename.com.mx”.
I can easily enough find any mismatches to mismatches where “sitename.com” is not used.
For example, this detects if “sitename.de” is used in the source and not in the target:

Source: “((sitename.[a-z]{1,3})=1”
Target: -@1

However, because the last two (“sitename.com.ca” and “sitename.com.mx”) have “.com” in the name, Xbench considers them a match (even if I select “Match whole word” or I use the “End of word” regex (e.g., “sitename.[a-z]{1,3}>”).)

I’ve tried even creating a Key term list with the domain names, but it seems Xbench considers the second period (“sitename.com.XX”) the end of the word, so it is thinks “sitename.com” is the full match.

Any help here? Thanks!

Hi Luis,

I would use the following regex to find all those domain names:

Source: "(sitename((\.[a-z]+))+)=1"
Target: -@1

Search mode: regular expressions
Match Whole Word and PowerSearch: on.

By the way, replace sitename with the domain you want to find.

An alternative to get all site names, would be to change the source term to "([a-z0-9\-]{1,63}((\.[a-z]+))+)=1"

However, you may get too many false errors.

I hope this helps.

Best regards,
Oscar.

Hi Óscar,

Thanks for your quick reply! However, the issue is still happening.
It does not detect an issue when the source is “sitename.com” and target is “sitename.com.ca”, for instance. I think it is because Xbench does not read “sitename.com.ca” as a single full word, but instead reads it as 3 words (sitename, com, and ca); as it considers the period a parsing character.

What can we do?

The following search works fine:

Source: "(sitename(\.[a-z]+)+)=1"
target: -@1

Search mode: regular expressions
Match Whole Word and PowerSearch: on.

Hi Oscar,

Sure, but I mean the other way around. It will detect when “sitename.com.ca” is in source and not in target appears, but not the other way around.

I tried reverting the expression, and it works:

Source: -@1
target: “(sitename(.[a-z]+)+)=1”

Can you help me confirm this would detect the same issues as in the one you showed me above? I.e.:

Source: “(sitename(.[a-z]+)+)=1”
target: -@1

EDIT: Meaning, would it detect all issues of mismatches in source and target?

Thanks!

This search will detect all segments that contain a domain name in source but is missing in the target.

You should create a checklist entry for each search.

Hi Oscar.
Got it, I’ll create two searches, one for missing in source and one for missing in target.

Thanks!