XPATH to include tags that follow a certain character

Hi! I am currently trying to optimize our XML parser rules for WorldServer V11.3.5.4758. Since this is an XPATH question, I thought I would post it here as I'm having no joy figuring this out by myself.

Basically, I have the following situation. Embedded in our XML files are some HTML tags, which we have handled up until now using the parser rules for our XML file type in WorldServer. I would like to keep it this way, if possible. So we are seeing tags like <B>, <U>, etc. What I want to optimize our the <BR/> tags.

Basically, I want them to break but allow a possible merge. This is no problem and is easily handled by setting //BR to "Inline" and "Exclude". However, if the <BR/> tag comes immediately following a comma, I would like to have these units automatically merged together, as this is a strong indicator that the units belong together.

Example:

<field attribute="f1463782524627-art">
<value>
<U>
<B>Sicherer digitaler Eingang: </B>
<space/>
</U>
<BR/>
Typ B, Sink Beschaltung, einstellbarer SW-Eingangsfilter<BR/>
<U>
<B>Sicherer analoger Eingang: </B>
<space/>
</U>
<BR/>
Typ B, Messbereich 0 bis 10 V / 0 bis 32 V / 0 bis 20 mA<BR/>
<U>
<B>Digitaler Eingang (ohne Diagnose): </B>
<space/>
</U>
<BR/>
Digitale Eingänge, Sink/Source Beschaltung pro Kanal konfigurierbar, einstellbarer SW-Eingangsfilter,<BR/>fix oder ratiometrisch einstellbare Schaltschwelle, Drahbruch und Kurzschlusserkennung<BR/>
<U>
<B>Digitaler Eingang (mit Diagnose):</B>
<space/>
</U>
<BR/>Digitale Eingänge, einstellbarer SW-Eingangsfilter, Drahtbruch und Kurzschlusserkennung<BR/>
<U>
<B>Analoger Eingang:</B>
<space/>
</U>
<BR/>Analoge Eingänge, Messbereich 0 bis 10 V / 0 bis 32 V / 0 bis 20 mA / 4 bis 20 mA / 1 bis 50 kΩ / Temperatureingänge, einstellbarer Analogfilter, <BR/>einstellbare Rampenbegrenzung, einstellbare Schwellenwerte, integrierter Eingangsschutz</value>
</field>

Sorry, I know it's not well-indented and probably not well-formed, but I hope it gets my point across (I've included all of this to show the full context). I would like this unit to be automatically merged:

<BR/>
Digitale Eingänge, Sink/Source Beschaltung pro Kanal konfigurierbar, einstellbarer SW-Eingangsfilter,<BR/>fix oder ratiometrisch einstellbare Schaltschwelle, Drahbruch und Kurzschlusserkennung<BR/>

so that it looks like this:

Digitale Eingänge, Sink/Source Beschaltung pro Kanal konfigurierbar, einstellbarer SW-Eingangsfilter,<BR-TAG>fix oder ratiometrisch einstellbare Schaltschwelle, Drahbruch und Kurzschlusserkennung

I've tried this, but it's not working:

//BR[ends-with(preceding::text()[1],',')]

(I've also tried preceding-sibling with no joy.)

I hope someone can help show me the way!

Top Replies

Parents Reply
  • Thanks for that, Rudi! According to the XPATH online testing site I use, this XPATH is indeed the correct one. Unfortunately, WorldServer can't seem to get its head around it so it basically doesn't work. I did a quick test in Trados Studio 2019 and it didn't work there either. So now I'm wondering if this is a known issue, whether this is a "feature" (for some reason), or whatnot. Does anyone happen to have any information about this, like whether this functionality will be added in a future version of Trados Studio / WorldServer? Maybe ?

    Thanks again for your fantastic help! If nothing else, I learned a bit more about XPATH! Slight smile

Children