XML issue: why Studio 2021 SR1 - 16.1.3.4096 (O/S sadly is WIN 7 Home professional) replaces the ' with a "?

A customer would send me XML files for translation since many years. The problem is that I receive error messages described in the screenshots attached. I (not at all XML programmer) looked into old XML files, and there was a file called uptime_trans.dtd. This was not sent along with the latest XML files. In Studio 2013 (as in all legacy versions) there is a facility in which Project Settings  contained all types of files with various XML types among them (vd. screenshot)

I am not a XML programmer, but I reckon that there is a way to enter this dtd file into the row with the file types so that further XML files can be treated adequately. Is this assumption correct? If so, how could I include the dtd file into the row with all the file types?

A further issue with these XML files was that a ' was replaced with a " which the effect that the customer could not oopen the target file although he could view the bilingual translation on Studio's editor screen. Why did Studio replace the ' with a double quotation mark " causing this error? I paste relevant files here hoping that they yield sufficient information. I really would appreciate your support since this issue is with a longtime customer, and it is hard to keep them during this pandemic.

  

Theme_Stromversorgung SMPS_00003.xmlTheme_Stromversorgung SMPS_00003_E.xml

  • I am not a XML programmer, but I reckon that there is a way to enter this dtd file into the row with the file types so that further XML files can be treated adequately. Is this assumption correct?

    Sort of.  The dtd is only used to validate the correctness of the XML.  Ideally you would create a custom XML filetype for your file and then include your schema if it's important for you to have the xml validated against it:

    https://docs.sdl.com/813470/534137/sdl-trados-studio-2021-sr1/xml-validation-page

    If you don't have the dtd then just turn off the validation as you don't have anything to validate against:

    I created a new XML filetype using your XML file to create the parser rules automatically... so I haven't made any changes to exclude stuff or handle anything in attributes but when I use this and the validation off I see this:

    I also don't see where you get the ' replaced with a ".  Can you point me to exactly where I need to look?

  • probably means that the attribute values are enclosed in double quotes in the translated document. They were enclosed in single quotes AND sometimes in double quotes in the original. Both is perfectly legal for XML and HTML as far as I know:

    Original:

    Translation:

    After having a second look: The file with "E" at the end, which I thought was Studio's output, is NOT the translation of the other file, but has been transformed somehow. Incorrectly.

    Original:

    "E"-File:

    There are a lot of unescaped characters in the element values of the source file which should have been escaped: <, >, ", '. That is with certaintly a source of problems.

    I wish I would understand what's going on with these files, but unfortunately I don't have the time to look further.

    Daniel

  • Thank you Paul. XML obviouly is a book with seven seals to me. Should I ask the customer to provide a specific XML type to have it included in the row of files (with a number of XMLs) as shown in the "Project settings"?

  • Thank you Daniel answering me despite of your limited time. If you should have a timeslot later on, I'd highly like to furnish you with everything needed to finding an answer as this is quite essential to continuing with good relations with one of my major clients in these pandemic times.

  • Should I ask the customer to provide a specific XML type to have it included in the row of files

    No.  You just create your own.  The ones out of the box are generic based around some sort of standard.  The reality is that an XML file can take thousands of forms for translation, so the software has you covered.  Perhaps review this article:

    https://multifarious.filkin.com/2014/06/01/custom-xml/

    .

  • Do I assume correctly that the first file (Theme_Stromversorgung SMPS_00003.xml) is the source file you get from your customer?

    Daniel

  • Thank you very much indeed, Paul. Your expertise is a credit to the community

  • I noticed that all text seems to exist twice in the source file, once as unescaped HTML, and once escaped in the dataset nodes. I assume you are trying to translate the unescaped version? Why are there two versions of the same text. Why is one escaped and one is not?

    Daniel

  • Hi , , @Daniel Hug,

    I´ve just encountered a similar if not the same issue.

    I have a Visio file in *.vdx format. I created a custom XML file type using the "old" XML type (new method for embedded content). Everything works fine, except that the resulting SDLXLIFF is too big to be handled effectively by Studio (especially if handled as part of a batch), with a size of over 100MB.

    As an alternative I tried out the "new" XML type "XML 2". Here the resulting SDLXLIFF has optimal size with under 1MB. However, when exporting the target file, single quotation marks are replaced by double quotation marks within the attributes of elements. This causes problems with the XML syntax since there are many attributes with content surrounded by double quotation marks.

    I used a RegEx to replace all double quotation marks with single ones where this caused problems with the XML syntax. After doing this, the XML syntax was valid and I was able to open the *.vdx file without a problem.

    Is there a way to avoid this issue? Why do the XML types differ on this functionality?

  • Is there a way to avoid this issue? Why do the XML types differ on this functionality?

    Excellent question and I'm afraid I don't have a good answer!  I ran a quick test on a file like this:

    quotes.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <rootelement>
      <xml><apos type='single quotes'>I like to be different!</apos></xml>
      <html>
    <![CDATA[
    <apos type='single quotes'>I like to be different!</apos>
    ]]>
      </html>
     </rootelement>

    The XML2 is the only one of all the XML filetypes we still have that is inconsistent, returning this:

    xml2_quotes.xml
    <?xml version="1.0" encoding="utf-8"?>
    <rootelement>
      <xml><apos type="single quotes">I like to be different!</apos></xml>
      <html>
    <![CDATA[
    <apos type='single quotes'>I like to be different!</apos>
    ]]>
      </html>
     </rootelement>

    I'll check with support to see whether this is something already noted.  In theory it shouldn't make a difference as both single quotes and double quotes are actually allowed in XML, but I have no idea why we try to "correct" it as opposed to just kicking out exactly what was put in.

    There is no workaround or button to check in the software that influences this.  It's all under the hood I'm afraid.  Interesting that the HTML filetype doesn't do this when handling the embedded content, but the XML filetype itself does.

  • Just an update to let you know this is now logged as a bug with our reference CRQ-25191.  You can quote this number to help with follow-ups when we release updated versions of the product.

  • Hi ,

    Thanks a lot for helping solve this issue.