XML with different structure - translation in studio.

Hello Community,

I need to translate XML which looks like this (Text changed to Lorem Ipsum):

"Only fragment"

snippet.xml
<?xml version="1.0" encoding="UTF-8"?><offer file_format="IOF" generated="2019-09-11 23:12:52" iaiext:expires="2019-09-12 23:12:52" xmlns:iof="http:" version="2.5" extensions="yes" xmlns:iaiext="http:"><products language="pol" xmlns:iaiext="http:">
<product id="12" currency="PLN" code_producer="POD125" iaiext:code_on_card="POD125" iaiext:producer_code_standard="GTIN13" iaiext:vat="23.0" iaiext:product_free="n" iaiext:save_serial_numbers="na" iaiext:site="2"><producer id="1308137276" name="PAESE"/>
<category id="1214553895" xml:lang="pol" name="Lorem ipsum dolor sit amet, consectetur adipiscing elit."/>
<iaiext:category_translation xml:lang="pol" value="Lorem ipsum dolor sit amet, consectetur adipiscing elit."/>
<iaiext:category_translation xml:lang="eng" value="Lorem ipsum dolor sit amet, consectetur adipiscing elit."/>
<unit id="0" xml:lang="pol" name="Lorem ipsum dolor sit amet, consectetur adipiscing elit."/>
<iaiext:unit_translation xml:lang="pol" value="Lorem ipsum dolor sit amet, consectetur adipiscing elit."/>
<iaiext:unit_translation xml:lang="eng" value="Lorem ipsum dolor sit amet, consectetur adipiscing elit."/>
<card url="http:"/>
<iaiext:card_translation xml:lang="pol" value="http:"/>
<iaiext:card_translation xml:lang="eng" value="http:"/>
<description><name xml:lang="eng"><![CDATA[Lorem ipsum dolor sit amet, consectetur adipiscing elit.]]></name>
<name xml:lang="pol"><![CDATA[Lorem ipsum dolor sit amet, consectetur adipiscing elit.]]></name>
<version name="Lorem ipsum dolor sit amet, consectetur adipiscing elit." ><name xml:lang="eng"><![CDATA[Lorem ipsum dolor sit amet, consectetur adipiscing elit.]]></name>
<name xml:lang="pol"><![CDATA[Lorem ipsum dolor sit amet, consectetur adipiscing elit.]]></name>
</version>
<long_desc xml:lang="eng"><![CDATA[<div class="col-md-8 col-xs-12">Lorem ipsum dolor sit amet, consectetur adipiscing elit.<strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<br />
<ul>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong></li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong></li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit. <strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong></li>
</ul>
<div class="hidden-text"><br />Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<ul>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
</ul>
</div>
<br /> <span class="show-hidden-text" data-alt-text="Lorem ipsum dolor sit amet, consectetur adipiscing elit.">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</span></div>
<div class="col-md-4 col-xs-12">
<div class="list-title"><span>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</span></div>
<ul>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
</ul>
</div>
<div class="long-description-bottom">
<div class="description_slider">
<div class="banner-wrapper">
<div class="visible-desktop visible-tablet hidden-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
<div class="hidden-desktop hidden-tablet visible-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
</div>
<div class="banner-wrapper">
<div class="visible-desktop visible-tablet hidden-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
<div class="hidden-desktop hidden-tablet visible-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
</div>
<div class="banner-wrapper">
<div class="visible-desktop visible-tablet hidden-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
<div class="hidden-desktop hidden-tablet visible-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
</div>
<div class="banner-wrapper">
<div class="visible-desktop visible-tablet hidden-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
<div class="hidden-desktop hidden-tablet visible-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit..jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
</div>
</div>
</div>]]></long_desc>
<long_desc xml:lang="pol"><![CDATA[<div class="col-md-8 col-xs-12">Lorem ipsum dolor sit amet, consectetur adipiscing elit.<strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<br />
<ul>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong></li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong></li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit. <strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong></li>
</ul>
<div class="hidden-text"><br />Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<ul>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li><strong>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</strong> Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
</ul>
</div>
<br /> <span class="show-hidden-text" data-alt-text="Lorem ipsum dolor sit amet, consectetur adipiscing elit.">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</span></div>
<div class="col-md-4 col-xs-12">
<div class="list-title"><span>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</span></div>
<ul>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
</ul>
</div>
<div class="long-description-bottom">
<div class="description_slider">
<div class="banner-wrapper">
<div class="visible-desktop visible-tablet hidden-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
<div class="hidden-desktop hidden-tablet visible-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
</div>
<div class="banner-wrapper">
<div class="visible-desktop visible-tablet hidden-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
<div class="hidden-desktop hidden-tablet visible-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
</div>
<div class="banner-wrapper">
<div class="visible-desktop visible-tablet hidden-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
<div class="hidden-desktop hidden-tablet visible-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
</div>
<div class="banner-wrapper">
<div class="visible-desktop visible-tablet hidden-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
<div class="hidden-desktop hidden-tablet visible-phone"><img style="width: 100%;" src="/data/include/cms/Lorem ipsum dolor sit amet, consectetur adipiscing elit.jpg" border="0" alt="Lorem ipsum dolor sit amet, consectetur adipiscing elit." /></div>
</div>
</div>
</div>]]></long_desc>
<short_desc xml:lang="eng"><![CDATA[Lorem ipsum dolor sit amet, consectetur adipiscing elit.]]></short_desc>
<short_desc xml:lang="pol"><![CDATA[Lorem ipsum dolor sit amet, consectetur adipiscing elit.]]></short_desc>

As you can see it is Bilingual and i need to get text marked in tags as xml:lang="eng". Studio cannot do this with build-in file types. I've tried to create new on (XML (embedded content) with parsers based on that xml. However it took everything to translate including contetnt mark as xml:lang="pol". Is there any solution to create parsers which ignore every entry marked as xml:lang="pol"?

Next problem is that there are some text to translate in tags. Her is an example:

<iaiext:category_translation xml:lang="eng" value="Lorem ipsum dolor sit amet, consectetur adipiscing elit."/>

Studio only gets content constructured like this: <tag>[content]</tag>. I was wondering if there is an option to get translation from tags constructed like in an example above.

Please let me know if you see any solutions that allow me to deal with this xml.

Thanks a lot,

Adrian

Parents
  • Hi

    I only have time for a partial answer:

    <iaiext:category_translation xml:lang="eng" value="Lorem ipsum dolor sit amet, consectetur adipiscing elit."/>

    Yes, you can extract text from XML attributes, although it is bad practice to store translatable content in attributes

    Do I understand correctly that you only want to extract the English text? That is totally standard, no problem at all. You just set up the parser accordingly. (Reckon that you will always have to create a dedicated file type for XML files - this goes for all CAT tools.)

    Do you want to extract both languages? That is a bit more tricky as Studio does not support bilingual XMLs out of the box. (I feel it should, but in a case like yours you'd end up with very big unwieldy segments.)

    If you want to use both languages, the easiest way is IMHO to extract both using two different file types´(only EN, only PL), align them using Studio's Aligner, and then do a PerfectMatch of the target language to the source language. That way you have reasonable segmentation.

    Hope it's a help.

    Daniel

Reply
  • Hi

    I only have time for a partial answer:

    <iaiext:category_translation xml:lang="eng" value="Lorem ipsum dolor sit amet, consectetur adipiscing elit."/>

    Yes, you can extract text from XML attributes, although it is bad practice to store translatable content in attributes

    Do I understand correctly that you only want to extract the English text? That is totally standard, no problem at all. You just set up the parser accordingly. (Reckon that you will always have to create a dedicated file type for XML files - this goes for all CAT tools.)

    Do you want to extract both languages? That is a bit more tricky as Studio does not support bilingual XMLs out of the box. (I feel it should, but in a case like yours you'd end up with very big unwieldy segments.)

    If you want to use both languages, the easiest way is IMHO to extract both using two different file types´(only EN, only PL), align them using Studio's Aligner, and then do a PerfectMatch of the target language to the source language. That way you have reasonable segmentation.

    Hope it's a help.

    Daniel

Children