Prevent hidden text from being imported as internal tags in Word 2007-2016 filter

Under Community Review

Marking text as "Hidden" in Microsoft Word prior to adding a file into a Trados Studio project used to be a convenient way to mark specific sections of the text as "not for translation" in a manner that was easy to revert later. Using the Word 2007-2013 filter, hidden text would be ignored altogether and would not be imported into Trados Studio.

However, since the Word 2007-2016 filter (WordprocessingML V. 2) was introduced, the way Studio handles hidden text has changed. Even if Studio is configured not to import hidden text, it is now treated as a placeholder and displayed in the Studio Editor. This produces unwanted results, such as:

- Hidden paragraphs appearing in the Studio editor as series of internal tags

- Hyperlink URLs in hidden paragraphs appearing as editable segments in the Studio Editor

- Hidden text surrounding non-hidden sentences (for example, string IDs in a .TXT-based string list) being imported as internal tags, such as the text marked in purple in this example: id_buttonCancel := "Cancel";

Previously, it was possible to prevent such text from being imported altogether, which was a clean and safe way to work with such files in Studio. The extra internal tags created by the new Word filter negatively impact TM leverage and create confusion by including unwanted content in the bilingual files. While workarounds for this issue exist, this behaviour renders common workflows more complicated and has no obvious benefit.

We suggest that the behaviour of the Word 2007-2013 filter re: hidden text be restored in the Word 2007-2016 filter, or at least that this behaviour be made available again as a selectable option (for example, under File Types > Microsoft Word 2007-2016 > Common > Extract hidden text for translation: Yes / As internal tags / No).

Best regards,

Francisco Paredes

Under Community Review
  • HI,

    that is not correct assumption, hidden text was always handled as placeholders. Are you sure you tested this correctly?

    Can you perhaps send more details on how you used this, with sample files?

    Regards

    Patrik

  • Hi Patrik,

    I beg to differ. This is not an assumption — I've tested this behaviour extensively and it's easy to reproduce.

    Consider the sample files in this Dropbox folder: www.dropbox.com/.../AACE-JzyXn-qNU9lt5ojIu0ia

    Tester A.docx and Tester B.docx are identical Word files. The former has been imported using the Word 2013 filter, the latter with the Word 2017 filter. The difference in the resulting SDLXLIFF files is fairly obvious, as you can see in the following screenshots.

    www.dropbox.com/.../Tester_A.png

    www.dropbox.com/.../Tester_B.png

    Regards,

    Fran

  • Actually if you show all content, you will see that old filter also extracted all as inline tags, one difference would be that now we would need to improve handling of hyperlinks in hidden text and probably not to include it into segments.

    I will log this and see when we can change this.

  • Hi Francisco, hi Patrik,

    we have also been experiencing problems with hidden text and the next Word file type. One of them (error message "An item with the same key has already been added" when trying to save the source/target) has been logged as LTB-1855 - according to SDL Support, there is a fix which is ready to be rolled out. But it is not yet available in CU 5 beta, according to my tests, see community.sdl.com/.../11128

    Kind regards

    Christine