ApSIC

Word distance in source with exclusion in target

Hi,

I am running a search on a set of files to make sure that a certain rule is respected. Basically, if in the Spanish source the words “presidente” or “vice presidente” are followed by “(e)”, the target must use the translation “acting president” or “acting vice-president”. The problem is that there may be several words between “presidente” and (e), for instance in a case like “Manuel Miranda, presidente de [long name of a Company department] (e)”

My first thought: modify the Word Distance search template for the source; specify -“acting” as the target, and use a Power Search.

Since I am here, you can tell that I haven’t been successful. My results:

  • “presidente>.{,60}<(e)>” seems to find all occurrences of “…presidente… …(e)” as expected if used with an empty target field

  • “-“acting”” used with an empty source field and run as a power search only finds one string - without a constraint in the source, it should pretty much find every single string in the project

  • The two used together find nothing

The files that I am searching on are TMXs. I created a dummy one that only contains a hit for the logic I am applying:

<?xml version="1.0" encoding="utf-8"?>
<tmx version = "1.4">
<header creationtool="Linguist TMX tool" creationtoolversion="1.0" segtype="paragraph" adminlang ="EN-US">
</header>
<body>
<tu>
<tuv xml:lang="es" creationdate="20190722T193451" creationid="!MT">
<seg>El presidente Joan Maria Vasquez Martinez (e)</seg>
</tuv>
<tuv xml:lang="en">
<seg>President Joan Maria Vasquez Martinez</seg>
</tuv>
</tu>
</body>
</tmx>

Any help is welcome!

Hi,

It’s curious that the target expression only gets one hit. Did you try testing the source + target expressions with a dummy source/target .txt file, like the one you attached?

Hi,

When using regular expressions as search mode, parentheses must be escaped if you search these characters.

Try the following search to find those strings:

Source: "presidente> (<[^[:space:]]+>[:space:])+ \(e\)"
Target: -“acting”
PowerSearch: On.
Search mode: Regular Expressions

Regards,
Oscar.

Hi Òscar,

you definitely were onto something, but your expression to search the source didn’t work. I have modified it as follows:

"presidente>.{0,60} \(e\)"

Once added the appropriate target exclusion expression and launched a power search, I get one valid hit as I expected. Thank you for identifying the issue - odd that I would get hits at all with non-escaped parentheses. I had noticed that the closing parenthesis was not part of the found string, but I ascribed it to a minor visualization issue; while it’s probably what prevented the whole search from working.

Thank you!
Stefano

Hi Robert,

I had only tried on the TMX dummy. At any rate, as you can read below, the issue was cleared by escaping the parentheses in the source and moving (e) out of the angled brackets <>.
Another odd fact: this morning, if I only use the target expression
-"acting"
and run a Power Search, I do find all strings as expected. I wonder what I was doing wrong yesterday…

Thank you
Stefano