How to exclude certain tags from CDATA?

Hi,

I've been trying to create XML settings that completely convert all html to tags, but I cannot manage to do so with tags such as:

<a href="">https://www.google.com" target="_blank"><img src="">www.google.com/.../googlelogo_color_92x30dp.png" /></a>

and

a href="">https://www.google.com" target="_blank"><img src="">www.google.com/.../googlelogo_color_92x30dp.png" /></a> 

found in  CDATA.

In my current settings, I checked 'Process embedded content using the following processor'  and set it Html Embedded Content 5.2.0.0.0 and to 'Process embedded content found in CDATA sections'.

Could you please advise how I can solve this? I tried using the app Cleanup Tasks and it works but I prefer to solve this in the file type if possible. Ideally, I would add a regex somewhere to convert all text between < and >  to tags.

Thanks in advance for your help!

Best regards,

Digna

Top Replies

  • This is actually quite interesting.  If you extract the html and render it in a browser you get this:

    Which is exactly what Studio does extract... the mark up as text  If you replace the…

Parents
  • Did you set some structure to the parser rule containing the CDATA so that Studio knows to apply your embedded content rule when it handles that element?

  • Hi Paul,

    Thank you for your reply. I don't think I did. Where can I do so? I now went to the 'Parser' menu and changed some settings for the 'InvitationBody' element (which is where the CDATA is found), but it did not change anything. 

    Could you please advise?

    Many thanks,

    Digna

  • I think this would be easier if you provided a small sample of the file, can you do that?

  • Sample file.xml
    <?xml version = '1.0' encoding = 'utf-8'?>
    <ExportInvitation>
      <RightToLeft>No</RightToLeft>
      <From>fromemail@shouldnotchange.com</From>
      <Name>Test name - should not change</Name>
      <ReplyTo>replyto@shouldnotchange.com</ReplyTo>
      <ReturnTo>returnto@shouldnotchange.com</ReturnTo>
      <Subject><![CDATA[Subject should be translated with exception of attributes like <Name>]]></Subject>
      <Preview>This is preview text, which should be translated entirely. (Always max 250 characters)</Preview>
      <InvitationBody><![CDATA[<div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">This text blok contains the email body text, which should be translated, with exception of elements between smaller than and greater than characters.</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">Example: Dear &lt;Name&gt;, should result in Dutch translation 'Beste &lt;Name&gt;,'</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="1" color ="Black"><br></font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">Formatting like</font><font face ="Microsoft Sans Serif" size ="1" color ="Black"> </font><font face ="Microsoft Sans Serif" size ="4" color ="Black">font size</font><font face ="Microsoft Sans Serif" size ="2" color ="Black"> or </font><font face ="Verdana" size ="2" color ="Black">font changes</font><font face ="Microsoft Sans Serif" size ="2" color ="Black"> or <b>bold</b> or <i>italic</i> or <u>underline</u> should not be changed and remain ' as-is' </font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">We also </font><font face ="Microsoft Sans Serif" size ="2" color ="Red">use</font><font face ="Microsoft Sans Serif" size ="2" color ="Black"> </font><font face ="Microsoft Sans Serif" size ="2" color ="Blue">color</font><font face ="Microsoft Sans Serif" size ="2" color ="Black"> </font><font face ="Microsoft Sans Serif" size ="2" color ="Green">references</font><font face ="Microsoft Sans Serif" size ="2" color ="Black">.</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="1" color ="Black"><br></font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">Images can also be inserted like below:</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="1" color ="Black">&lt;a href="https://www.google.com" target="_blank"&gt;&lt;img src="https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png" /&gt;&lt;/a&gt;</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">Or as embeded image:</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="1" color ="Black">&lt;a href="https://www.google.com"&gt;&lt;img src="CID:https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png" /&gt;&lt;/a&gt;</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black"><br></font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">Survey URL can be referenced with several options, but in all cases the smaller than and greater than characters will be used to identify a system text, which should not be translated.</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">&lt;URL&gt;</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">&lt;EmailQuestionURL&gt;</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black"><br></font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">Best regards,</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">Elon</font></div><div style ="text-align:left"><font face ="Microsoft Sans Serif" size ="2" color ="Black">&lt;any value between these characters ('&lt;' and '&gt;') should remain as-is and not be translated or changed&gt;' </font></div>]]></InvitationBody>
    </ExportInvitation>

    Hi Paul,

    Yes, sure, please find it here attached.

    Thanks for your help!

    Digna

  • That's elementary...
    You should create custom XML file type, where you extract (as translatable) only "Subject" and "InvitationBody" elements and nothing else, i.e. set "//*" as not translatable.
    And in the "Embedded Content" setting configure the HTML 5 embedded contebnt processor, set it to process "CDATA sections" and that's it.
    And if you want to have "attributes like <Name>" converted to tags, you will need to add these to the list of tags in the HTML 5 embedded content processor settings.

    That's it.

  • Hi Evzen,

    Thank you for your reply. That's actually what I did (please see my first post), but it skips the links to images <a href etc>. As these links are different all the time, I hope there is a solution that covers all of those, instead of having to add each one of them separately to the tags list.

    Many thanks,

    Digna

  • Your first post apparently does NOT use the custom XML filetype. If it did, the sentence would not be broken to separate segments containing only one or two words. It looks more like it's using the XML:Any filetype.
    Check the details of the project, you should see the file type used in one of the columns in files view.

  • This is actually quite interesting.  If you extract the html and render it in a browser you get this:

    Which is exactly what Studio does extract... the mark up as text  If you replace the entities with <> symbols you get this:

    Which is the behaviour you expected.  Studio will also handle it correctly like this:

  • Hi Paul,

    Thank you for your help! It did cross my mind that I should convert the entities but I assumed that Trados would do so during import. I now converted the entities and it works. 

    I don't think there is a setting for this, right? I can only find the one for generating the target.

    Many thanks,

    Digna

Reply Children
No Data