I have noticed that NMT engine tends to truncate or totally distort some drug molecule names. It totally scrambles everything and invents words instead. See screenshot hereunder. I would like to point out that this does NOT happen with Language Cloud engines (I checked it on the same file, same segments, etc.), and since NMT is supposed to be more powerful and efficient than Language Cloud, this should be resolved ASAP.

Any ideas?


Isla ... on behalf of all medical translators Wink

  • Isla, as Martin pointed out, NMT will struggle with very long strings full of tags and unknown words, not normally found in English dictionary (or in corpora used to train the MT engine). These are a domain specific terms (Medical) for which more useful would be a custom-trained engine. 

    The quality of the MT output also depends on what filter on MT side is used to parse the content. You may get different results when you send the same string with tags converted to XLIFF, SDLXLIFF or HTML format 

    In your example I can see the segmentation could be improved e.g.  ( <p> <ul> and <li> tags should be set as Structure tags), so you get each list item on a new segment. In that case you'll get better leverage from a TM and Machine translation.

  • Hi,

    would it help / be possible for the NMT engines to use an SDL Language Cloud Terminology termbase (which includes such non-translatable medical terms) as an MT dictionary?

    Though the MT dictionaries just provide an override mechanism / simple search and replace of dictionary entries in the NMT output, it might actually provide to be very beneficial in this case.

    See - I think it is still valid for NMT?

    Kind regards


Reply Children