Under Community Review

HTML Entity Conversion similar to XML2; allowing Read as character and write as character for any/all characters

Currently, in order for HTML and Embedded HTML parsers to normalize entities as the character itself (i.e. ® converted to ® instead of an inline tag) you must enable entity conversion. However, this means that when the file is written out it will write all ® as ® regardless of what the source was. Recently with XML 2, it is possible to define Read as character and write as character for each value (this is currently only possible for <, >, ", ', and & via te Advanced HTML Entity Settings). In most cases I want to write out the actual character and not the entity considering we are using Unicode, but because we have a lot of legacy content that uses HTML entities heavily, I need to normalize for TM reuse, ease of translation, and consistency across the TMs.

Additionally, because we use the embedded HTML parser for many files that while they can render entities in most fields that is not the case for others like SEO, metadata, et cetera. This lack of properly HTML entity handling is causing us numerous headaches.

  

Here is a related thread that I started 4 years ago hoping to get something like this but so far there has been no progress:

https://community.sdl.com/product-groups/translationproductivity/f/studio/8210/entity-handling-in-sdl-trados-studio-you-cannot-read-in-all-entities-as-actual-characters-into-the-translation-interface-and-write-them-out-as-the-actual-characters-as-well

  • Going from legacy HTML content still containing HTML named entities to clean HTML code containing only Unicode characters is a fairly common requirement for many clients. This should be made available without odd work-arounds.

  • Note, that we still have legacy ITD file types that handle this better than the current SDL Trados Studio filters. Unfortunately, we continue to limp along with our old ITD file types in SDL TMS due to a lack of support for the normalization of entities as characters for translation but the abilty to write them out as the actual characters and not the entities.