ApSIC Xbench Forum

A rule to identify spaces between Chinese

Hi there!
Could you please provide a rule or regex to indentify the extra spaces between Chinese words.
Thanks in advance!

Hi,

The following rule will help you identify more than 1 space between Chinese characters:

Source or target term: [\x4E00-\x9FFF\x3400-\x4DBF\x20000-\x2A6DF\x2A700-\x2B73F\x2B740-\x2B81F\x2B820-\x2CEAF\x2CEB0-\x2EBEF\x30000-\x3134F\xF900-\xFAFF\x2F800-\x2FA1F][:space:]{2,}[\x4E00-\x9FFF\x3400-\x4DBF\x20000-\x2A6DF\x2A700-\x2B73F\x2B740-\x2B81F\x2B820-\x2CEAF\x2CEB0-\x2EBEF\x30000-\x3134F\xF900-\xFAFF\x2F800-\x2FA1F]
Search mode: regular expressions

To find any space between Chinese characters, use this other rule:

Source or target term: [\x4E00-\x9FFF\x3400-\x4DBF\x20000-\x2A6DF\x2A700-\x2B73F\x2B740-\x2B81F\x2B820-\x2CEAF\x2CEB0-\x2EBEF\x30000-\x3134F\xF900-\xFAFF\x2F800-\x2FA1F][:space:][\x4E00-\x9FFF\x3400-\x4DBF\x20000-\x2A6DF\x2A700-\x2B73F\x2B740-\x2B81F\x2B820-\x2CEAF\x2CEB0-\x2EBEF\x30000-\x3134F\xF900-\xFAFF\x2F800-\x2FA1F]
Search mode: regular expressions

Best regards,
Oscar.

Dear Oscar,
Thanks for your quick reponse!
Here is my test result.
Lowest resource
最低 资源要求
The expression really works!
But I like to avoid the English sentences or the sentences like “可以查看 PatchDeployment 资源”, just focusing Chinese without any English! Could you please help again?
BR
Wang Ximin

Hi Wang,

The regular expression does not detect spaces between Chinese and Latin characters.

The following regular expression should detect segments with spaces between Chinese characters but that do not contain Latin characters.

"[\x4E00-\x9FFF\x3400-\x4DBF\x20000-\x2A6DF\x2A700-\x2B73F\x2B740-\x2B81F\x2B820-\x2CEAF\x2CEB0-\x2EBEF\x30000-\x3134F\xF900-\xFAFF\x2F800-\x2FA1F][:space:][\x4E00-\x9FFF\x3400-\x4DBF\x20000-\x2A6DF\x2A700-\x2B73F\x2B740-\x2B81F\x2B820-\x2CEAF\x2CEB0-\x2EBEF\x30000-\x3134F\xF900-\xFAFF\x2F800-\x2FA1F]" - "<[a-z]{2,}>"
Search mode: Regular Expressions
PowerSearch: On

Best regards,
Òscar.

Hi Òscar,
Thanks a million.

Best Regards
Wang Ximin

Hi Òscar,
You are awesome!
And could you please provide a reglular expression to help detect if blank spaces are missing.
Just like this: 谢谢您帮我解决第1个问题。which should be 谢谢您帮我解决第 1 个问题。 that is to say, there should be spaces around a non-Chinese character. Many thanks!

Best Regards
Wang Ximin

This regex should help you find any missing spaces before or after numbers or Latin characters.

[^a-z0-9[:space:]][a-z0-9]+ OR [a-z0-9][^[:space:]a-z0-9]
Search mode: Regular Expressions.
PowerSearch: On.