Does XPP 9.x support U+2028

We just discovered at this late date that XPP 8.4 does not seem to natively support the U+2028 (and presumably U+2029) line and para separator "Unicode control characters." U+2028 gives us the reversed question mark "missing character" symbol.

Our Belgium office is using more and more independent translators across the EU, and they do seem to get creative with their use of who-knows-what XML or text editors when working outside their translator workbench applications.

We are in process of upgrading to XPP 9.4. Curious if anything was added to XPP 9.x to do something with these characters besides the reversed question mark "missing character" symbol?

If not, what's the best manual intervention, or fix in supporting data specs?

  • AFAIK XPP (including XPP 9.4) has no "knowledge" of the U+2028 or U+2029 Unicode characters.

    I would imagine that XPP is just treating them as "characters" and not finding them in your fonts and/or font specs.

    What do you want XPP to do with these characters?

    Jonathan Dagresta
    SDL XPP Engineering

  • My guess would be that for XPP you would want to do a "translate" of the incoming XML data and either drop these characters or translate them into CR/NL or space characters, depending on how they are used within the XML data stream.

    Jonathan Dagresta
    SDL XPP Engineering

  • Because XPP does some helpful things with whitespace (such as the recently discussed toxsf -mlesp), I guess an old-fashioned newline is the best.

    The DITA XML vocabulary does not have an element equivalent to HTML <br>, but other vocabularies do often have <br> or <break>. What DITA XML has are elements like <pre> and <lines> and <codeblock>, where hard returns are rendered as <br> in HTML-based outputs, and anything you want in PDF-based outputs.

    Anyway, in our world, it's inappropriate to use (and expect!) any "hard line return" to be respected as some writer's or editor's formatting fantasy, in a multichannel output world.

    So if U+2028 isn't handled like U+000D or U+000A (or both in combination) in XPP, then I guess the expectation is that we'll have to transform it prior to toxsf. I wouldn't want to just strip it, since there does seem to be the intent of a line break.

    U+2029 would be a much worse can of worms if no other XML tagging existed that described a paragraph end followed by a paragraph start, whatever that might be. A well-written transformation would have to split the element in which it is contained, that would be my guess.