CORRUPTED WORDS WITH NMT ENGINE

I have noticed that NMT engine tends to truncate or totally distort some drug molecule names. It totally scrambles everything and invents words instead. See screenshot hereunder. I would like to point out that this does NOT happen with Language Cloud engines (I checked it on the same file, same segments, etc.), and since NMT is supposed to be more powerful and efficient than Language Cloud, this should be resolved ASAP.

Any ideas?

Thanks!

Isla ... on behalf of all medical translators Wink

Parents
  • Isla, as Martin pointed out, NMT will struggle with very long strings full of tags and unknown words, not normally found in English dictionary (or in corpora used to train the MT engine). These are a domain specific terms (Medical) for which more useful would be a custom-trained engine. 

    The quality of the MT output also depends on what filter on MT side is used to parse the content. You may get different results when you send the same string with tags converted to XLIFF, SDLXLIFF or HTML format 

    In your example I can see the segmentation could be improved e.g.  ( <p> <ul> and <li> tags should be set as Structure tags), so you get each list item on a new segment. In that case you'll get better leverage from a TM and Machine translation.

  • Hi,

    would it help / be possible for the NMT engines to use an SDL Language Cloud Terminology termbase (which includes such non-translatable medical terms) as an MT dictionary?

    Though the MT dictionaries just provide an override mechanism / simple search and replace of dictionary entries in the NMT output, it might actually provide to be very beneficial in this case.

    See https://gateway.sdl.com/apex/communityknowledge?articleName=000004051 - I think it is still valid for NMT?

    Kind regards

    Christine

  • SDL Machine Translation Cloud (aka BeGlobal v4 NMT) translation provider plugin for SDL Trados Studio 2019 does not yet support Dictionary. There are plans to add it, but don't know when that will be.

    SDL Language Cloud provider does provide Dictionary support, but it is for SMT (Statistical MT) and soon to be deprecated.

    In Isla's case the first step should be to resolve the segmentation which should help.  If the medical terms in brackets () don't need to be translated then in Studio they could be added to a Variable Lists within Language resource Template alternatively crafting some Regex rules in filter settings to lock them/turn them into placeholders during text extraction.

Reply
  • SDL Machine Translation Cloud (aka BeGlobal v4 NMT) translation provider plugin for SDL Trados Studio 2019 does not yet support Dictionary. There are plans to add it, but don't know when that will be.

    SDL Language Cloud provider does provide Dictionary support, but it is for SMT (Statistical MT) and soon to be deprecated.

    In Isla's case the first step should be to resolve the segmentation which should help.  If the medical terms in brackets () don't need to be translated then in Studio they could be added to a Variable Lists within Language resource Template alternatively crafting some Regex rules in filter settings to lock them/turn them into placeholders during text extraction.

Children