Please note: On Sunday 16th May 00AM – 04AM CEST we will be performing planned maintenance on the RWS Community website. During this time you will not be able to access the website and post/reply. Apologies for the inconvenience.

Entities settings for AEM XML

 Hi!

We received feedback from a customer with TMS & AEM connector set up, regarding issues with " " and " ". Their XML files contain both of these escape sequences and the issues related to that are the following:

  • their system produces errors importing translated XML files if our translation failed to preserves " ". i.e. " " causes import errors.
  • our translation is displayed with " " on their website when we deliver translated XML files with " " i.e. sometimes translation needs to contain " " NOT  " to render correctly on their website.

 

Have you come across issues such as these with AEM connector? Do you have any suggestions to the customer? I appreciate your feedback!

Thank you,

Naoko

Parents
  • Hi ,

    How did this problem was solved in the end? If it got solved. 

    Thanks,

    Tudor

  • Hi ,

    I was working on AEM XML files for custom parser just this week for a completely different client, they had no escaped entities in their XML, apart from embedded HTML, meaning my entities settings were completely turned off/unchecked for both XML file type as well as HTML5 ECP. 

    Come to think of it after all these years, I think most of the AEM clients do not have the issues I was asking about, so there must be some sort of options you can adjust on AEM side. I wish I knew what they were.

    Thanks,

    Naoko 

  • Oddly enough, I just ran into this issue after finally switching out from our old ITD based file type with custom preprocessing tool which would wrap everything in CDATA and unescape it and then undo that on the way back out.

    Note, that we use Clay tablet connector to move content from AEM to SDL TMS and it is generating the XML, so I am not sure this matches what you are doing. However, they do expect double escaping as you describe (i.e. &). They way to achieve this is through both the HTML5 embedded processor and XML processor.

    In the XML > entities, I set Advanced XMl Entity Setting to write out lt, gt, quot, apos and amp as entities.

    Similarly, I set them for my Embedded Content Processors > HTML5 with the exception of quot and apos since those are used frequently in the actual text

    Note that you can export these two parsers (*.sdlecsettings and *.sdlftsettings) and combine them by simply keeping the <SettingsGroup> element from each. This is described in detail here:

    gateway.sdl.com/.../communityknowledge

  • Hi

    I can't remember if the problem I was having was also with Clay tablet connector. I am curious to see if you would come across the second issue in my original post with the settings you have shared. I think French and Czech used a lot of non-breaking spaces and that caused a lot of issues where English source didn't have any non-breaking spaces.

    • our translation is displayed with "&nbsp;" on their website when we deliver translated XML files with "&amp;nbsp;" i.e. sometimes translation needs to contain "&nbsp;" NOT &amp;nbsp;" to render correctly on their website.
  • Naoko, I am not sure about your case or how the files are being sent to TMS from AEM but what I have discovered is that in AEM there are some rich text/HTML compatible fields and some that are just plain text, like the browser title. In our case, the XML that is sent to us from AEM/Clay Tablet does not indicate what type it is so we simply apply the HTML Embedded Processor on all of them to be safe. Because we have a mix of content and sometimes they include HTML entities like &reg; we have to turn on Entity Conversion to normalize them into the character for translation (otherwise they appear as a tag and we lose TM leverage). However, by doing this the translated target file will always write out those characters as the entity as well. This works fine for those rich text/HTML fields but in those plain text fields, they will either display as the entity (&reg;) or worse be rejected by the system. We saw both cases. I ended up having to go back to ITD with a custom pre-processor because the current SDL Trados Studio filters do not allow us to normalize these entities to the character and then write them out as the character as well. I have written an idea to add this feature for HTML and Embedded HTML since it is already available for XML2 (However, note that XML2 is not available in TMS yet). If possible go there and add your vote, so we can get this into the Studio parsers and then into TMS. 

    community.sdl.com/.../html-entity-conversion-similar-to-xml2-allowing-read-as-character-and-write-as-character-for-any-all-characters

    In short, I am not sure how others are handling this for AEM but it seems like a major shortcoming.

  • Hi

    Thanks for the updates and I have voted for your idea. We cannot use XML2 because TMS doesn't support it and XML2 was still beta-ish in terms of segmentation hint handling in Studio 2019. As far as I know it was not really ready for production, but I will need to check it in Studio 2021 and see if it's any better now.

  • Hi 

    I learned today from my coworker, who was working with a client with Clay Tablet connector, that the following setting on the connector side is a possible solution. I don't remember the details of how my client from my original post resolved the issue, but I would guess that this might be it. I hope this is helpful.

    Changing the Use_CData setting from its default value of false to true. This setting instructs the Connector to wrap content in CData tags, which prevents the Connector from escaping special characters, and avoids the scenario of double-escaped characters. However, this setting does not prevent Adobe Experience Manager from single-escaping special characters in rich text.

    Note: The Connector adds and removes the CData tags, so they are not displayed within Adobe Experience Manager's CRXDE Lite.

    Important: If you change this setting, your translators must return the translated content in CData tags, just as they received the source content in CData tags. They should not run any post-translation scripts to escape the special characters before returning the content.

    Warning: If you change this setting in the middle of a translation job, it can interfere with the integrity of the translation memory.

  • Thank you very much for the tip. I found the following documentation on this but will have to test it out. 

    connectors.lionbridge.com/.../Configuring_CDATA_short.htm

  • Hi

    I am not sure about lionbridge connector. The information I provided above is for Clay Tablet Connector settings.

  • It is the same thing... L10n Bridge bought Clay Tablet.

Reply Children
No Data