Deepl pretranslation fails with 'start tag does not have a matching end tag'

Hello everyone,

I created a test file, so everyone should be able to reproduce the error.

Test file:

deepL_MT_error.docx.sdlxliff.zip

Error message:

sdlerror-2018101-11h22m38s.sdlerror.xml
<SDLErrorDetails time="01.10.2018 11:22:44">
  <ErrorMessage>The start tag with ID '14' does not have a matching end tag within the segment.</ErrorMessage>
  <Exception>
    <Type>Sdl.LanguagePlatform.TranslationMemoryTools.InvalidSegmentContentException, Sdl.LanguagePlatform.TranslationMemoryTools, Version=1.6.0.0, Culture=neutral, PublicKeyToken=c28cdb26c445c888</Type>
    <HelpLink />
    <Source>Sdl.LanguagePlatform.TranslationMemoryTools</Source>
    <HResult>-2146233088</HResult>
    <StackTrace><![CDATA[   bei Sdl.LanguagePlatform.TranslationMemoryTools.MarkupDataSegmentBuilder.CheckForUnclosedContainers()
   bei Sdl.LanguagePlatform.TranslationMemoryTools.MarkupDataSegmentBuilder.VisitLinguaSegment(Segment segment)
   bei Sdl.LanguagePlatform.TranslationMemoryTools.SegmentBuilderWithTagApplier.ApplyTags(Segment segment, ISegment segmentToSearchIn, IAbstractMarkupDataContainer documentContent)
   bei Sdl.LanguagePlatform.TranslationMemoryTools.SegmentBuilderWithTagApplier.ApplyTags(SearchResult searchResult, ISegment segmentToSearchIn, IAbstractMarkupDataContainer documentContent, Penalty memoryTagDeletedPenalty)
   bei Sdl.ProjectApi.AutomaticTasks.TranslationMemoryLookupContentProcessor.CreateTranslation(TaskSegmentSearchData segmentSearchData, SearchResult searchResult)
   bei Sdl.ProjectApi.AutomaticTasks.TranslationMemoryLookupContentProcessor.TranslationMemoryLookupImpl.GetBestTranslation(ParagraphUnitId paragraphUnitId, SegmentId segmentId)
   bei Sdl.ProjectApi.AutomaticTasks.Translate.TranslateContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
   bei Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.BilingualContentHandlerAdapter.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
   bei Sdl.ProjectApi.AutomaticTasks.TranslationMemoryLookupContentProcessor.OutputParagraphs()
   bei Sdl.ProjectApi.AutomaticTasks.TranslationMemoryLookupContentProcessor.ProcessWaitingParagraphs()
   bei Sdl.ProjectApi.AutomaticTasks.TranslationMemoryLookupContentProcessor.FileComplete()
   bei Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.FileComplete()
   bei Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.BilingualContentHandlerAdapter.FileComplete()
   bei Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.BilingualContentHandlerAdapter.FileComplete()
   bei Sdl.FileTypeSupport.Framework.Integration.AbstractBilingualProcessorContainer.FileComplete()
   bei Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.FileComplete()
   bei Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.FileComplete()
   bei Sdl.FileTypeSupport.Framework.Integration.AbstractBilingualProcessorContainer.FileComplete()
   bei Sdl.FileTypeSupport.Bilingual.SdlXliff.XliffFileReader.OnEndFile()
   bei Sdl.FileTypeSupport.Bilingual.SdlXliff.SdlXliffFeeder.<ContinueScanning>b__2(ISdlXliffStreamContentHandler handler)
   bei System.Collections.Generic.List`1.ForEach(Action`1 action)
   bei Sdl.FileTypeSupport.Bilingual.SdlXliff.SdlXliffFeeder.ContinueScanning()
   bei Sdl.FileTypeSupport.Bilingual.SdlXliff.XliffFileReader.ContinueParsing()
   bei Sdl.FileTypeSupport.Bilingual.SdlXliff.XliffFileReader.ParseNext()
   bei Sdl.FileTypeSupport.Framework.Integration.FileExtractor.ParseNext()
   bei Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.ParseNext()
   bei Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.Parse()
   bei Sdl.ProjectApi.Implementation.TaskExecution.ContentProcessingTaskImplementation.TaskFileExecuter.Parse(String targetFilePath)]]></StackTrace>
  </Exception>
  <Environment>
    <ProductName>SDL Trados Studio</ProductName>
    <ProductVersion>14.0.0.0</ProductVersion>
    <EntryAssemblyFileVersion>14.0.5889.5</EntryAssemblyFileVersion>
    <OperatingSystem>Microsoft Windows 10 Enterprise</OperatingSystem>
    <ServicePack>NULL</ServicePack>
    <OperatingSystemLanguage>1031</OperatingSystemLanguage>
    <CodePage>1252</CodePage>
    <LoggedOnUser>$$$$$</LoggedOnUser>
    <DotNetFrameWork>4.0.30319.42000</DotNetFrameWork>
    <ComputerName>$$$$$</ComputerName>
    <ConnectedToNetwork>True</ConnectedToNetwork>
    <PhysicalMemory>33478380 MB</PhysicalMemory>
  </Environment>
</SDLErrorDetails>

I think the second segment is too long for the DeepL Translation Provider, as I'm not getting any matches in the Editor either. I guess, since it's not able to process the whole sentence, the end tag is missing, which is why Studio gives me the error message.

I know that there will always be segments that cannot be translated with the engine. What bothers me is that the whole pretranslation fails when one segment is causing problems.

Would it be possible that the plugin skips segments that cannot be translated and goes on with the rest of the document? In the actual state, the pretranslation fails completely and I have to lock the problematic segments manually in order to get a pretranslation for the rest of the document.

 

EDIT:

Sorry, I forgot about the plugin Version I'm using: it's V 3.2

Parents
  • Hi

    That was very helpful. There are actually two bugs here, one with DeepL and one with us. The DeepL bug is that the response we receive through the API is wrong as we get a missing closing tag and parenthesis. This is actually the same response in their web ui.

    The Studio bug, is as you mentioned, that we don't return translations for the rest of the file when one segment fails.

    Both of these bugs are being logged. The DeepL guys are working on resolving that glitch and this will actually fix the problem for Studio, but we do need to put a proper fix in place for this issue in Studio as well in case any more like this crop up.
  • Hi ,
    sorry for bothering you again regarding the DeepL plugin. Are you able to give me a rough estimate of how long it will take to develop a fix, so the pretranslation finishes even if there are segments that cannot be translated properly? The problem here is, too, that DeepL takes all the characters into account, even if I am not getting any results due to the unfinished pretranslation.
  • 1. I'm pretty sure you did

    2. I can send you the problematic PDF if this contributes to the solution (please, don't tell me it's because I'm trying to convert a PDF with your third-party PDF converter that's been built into Studio for... ever?) - e-mail me at kominski@outlook.com for that

    3. Complaining to vendors about them wasting your time is never a waste of time. Plus: what about that issue with Trados not being able to omit the pre-translation error and move on? How long more do we need to wait for that seemingly simple fix? This isn't even case-specific so what do my files have to do with it anyway?

    Why don't you just do some goddamn bughunting for once, with 5 guys (or girls) converting and deepl'n 20 different PDFs a day from around the web, trying to pinpoint the problem areas? You're probably going to say this is the job DeepL should do, but the point is that it's hurting your customer base more than it's hurting DeepL, I believe.

  • All the bugs I reported regarding the plugin have been fixed already (within days, never more than one or two weeks). There are still a few issues with tab signs and returns but the problem here lies with DeepL. If there is a segment that can't be translated, the pretranslation goes on. We use the plugin on a daily basis and I really can't complain about anything going wrong. And I would, if it was the case ;-)

    Haven't tried it with a PDF though, we always convert them to Word or Powerpoint prior to processing them in Studio.

  • Not helpful at all.  We have reacted to all the bugs in deepl really quickly as they were found.  They weren't always down to us, sometimes they were down to deepl who are pretty new on the block wrt provision of an MT solution.

    I'm not complaining about vendors, only about you.  It's pointless to post in here on a thread that's 7-months old and not give us anything to go on.  Developers are not there to bug hunt and neither are we.  The plugin is not part of studio, it's a free add on and we support it diligently.  I don't think it's a lot to ask that you provide some information to help and then we'll be happy to put a developer on it.

    I'll email you for the PDF.

  • Thanks .  Sadly I think Adrian just has a problem with me.

  • Good for you, Thilo. I too have been working with Studio without a problem for some 2 months or so now, until I ran into this bug again. So maybe you just haven't had the bad luck just yet.

    What is surprising, however, is your claim that Studio moves on after encountering a problem. Do you mean by that the bug being the subject of this thread (see the title), or some other errors?

  • Yes, I am referring to the bug being the subject of this thread.

    If you test the pretranslation with the test file I provided, everything works fine. Of course, since the creation of this thread, the MT Engine has been improved, so there is not a problem at all with the translation of the file.

    However, a few weeks ago I had a document containing special characters/tags that were not supported by the plugin (see https://community.sdl.com/product-groups/translationproductivity/f/openexchange_applications/24978/deepl-translation-provider-pre-translation-problem ). The pretranslation was successful, only the segments containing the special characters (and a few segments before and after) were left empty.

  • Thanks for the PDF .  I'm afraid I can't reproduce this at all.  I pretranslated the file after creating a project in Studio with your PDF and the entire file was translated, all 7227 segments in it.

    I'll tell you what I'm using and perhaps you can compare:

    SDL Trados Studio 2019 SR1 - 15.1.3.55768

    DeepL plugin version 4.8.5

    Iris plugin version 2.0

  • Ok, I'm willing to give it another try then, and send you the probably inevitable error report.

    How do you check the plugin and Iris versions? I looked around in settings, but couldn't find anything. The Plug-ins window doesn't show the versions.

  • sdlerror-2019516-18h44m30s.sdlerror.sdlerror.xml
    <SDLErrorDetails time="16.05.2019 18:44:35">
      <ErrorMessage>The start tag with ID '13028' does not have a matching end tag within the segment.</ErrorMessage>
      <Exception>
        <Type>Sdl.LanguagePlatform.TranslationMemoryTools.InvalidSegmentContentException, Sdl.LanguagePlatform.TranslationMemoryTools, Version=1.6.0.0, Culture=neutral, PublicKeyToken=c28cdb26c445c888</Type>
        <HelpLink />
        <Source>Sdl.LanguagePlatform.TranslationMemoryTools</Source>
        <HResult>-2146233088</HResult>
        <StackTrace><![CDATA[   w Sdl.LanguagePlatform.TranslationMemoryTools.MarkupDataSegmentBuilder.CheckForUnclosedContainers()
       w Sdl.LanguagePlatform.TranslationMemoryTools.MarkupDataSegmentBuilder.VisitLinguaSegment(Segment segment)
       w Sdl.LanguagePlatform.TranslationMemoryTools.SegmentBuilderWithTagApplier.ApplyTags(Segment segment, ISegment segmentToSearchIn, IAbstractMarkupDataContainer documentContent)
       w Sdl.LanguagePlatform.TranslationMemoryTools.SegmentBuilderWithTagApplier.ApplyTags(SearchResult searchResult, ISegment segmentToSearchIn, IAbstractMarkupDataContainer documentContent, Penalty memoryTagDeletedPenalty)
       w Sdl.ProjectApi.AutomaticTasks.TranslationMemoryLookupContentProcessor.CreateTranslation(TaskSegmentSearchData segmentSearchData, SearchResult searchResult)
       w Sdl.ProjectApi.AutomaticTasks.TranslationMemoryLookupContentProcessor.TranslationMemoryLookupImpl.GetBestTranslation(ParagraphUnitId paragraphUnitId, SegmentId segmentId)
       w Sdl.ProjectApi.AutomaticTasks.Translate.TranslateContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.BilingualContentHandlerAdapter.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.ProjectApi.AutomaticTasks.TranslationMemoryLookupContentProcessor.OutputParagraphs()
       w Sdl.ProjectApi.AutomaticTasks.TranslationMemoryLookupContentProcessor.ProcessWaitingParagraphs()
       w Sdl.ProjectApi.AutomaticTasks.TranslationMemoryLookupContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Filters.Processors.SegmentRenumberingBilingualProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.BilingualContentHandlerAdapter.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.BilingualContentHandlerAdapter.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Framework.Integration.AbstractBilingualProcessorContainer.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Framework.Integration.LocationMarkerLocator.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Framework.Integration.AbstractBilingualProcessorContainer.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Framework.Integration.FileExtractor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       w Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.ParagraphUnitBuffer.ProcessParagraphUnit(IParagraphUnit pu)
       w Sdl.FileTypeSupport.Bilingual.SdlXliff.XliffFileReader.OutputParagraphUnit(IParagraphUnit pu)
       w Sdl.FileTypeSupport.Bilingual.SdlXliff.XliffFileReader.ParseLocalizableParagraphUnit(transunit transunit, LockTypeFlags lockFlags)
       w Sdl.FileTypeSupport.Bilingual.SdlXliff.XliffFileReader.ParseTransUnit(transunit transunit)
       w Sdl.FileTypeSupport.Bilingual.SdlXliff.XliffFileReader.ParseGroup(group group)
       w Sdl.FileTypeSupport.Bilingual.SdlXliff.XliffFileReader.OnGroup(XmlElement groupElement)
       w Sdl.FileTypeSupport.Bilingual.SdlXliff.SdlXliffFeeder.<ContinueScanning>b__14_15(ISdlXliffStreamContentHandler handler)
       w System.Collections.Generic.List`1.ForEach(Action`1 action)
       w Sdl.FileTypeSupport.Bilingual.SdlXliff.SdlXliffFeeder.ContinueScanning()
       w Sdl.FileTypeSupport.Bilingual.SdlXliff.XliffFileReader.ContinueParsing()
       w Sdl.FileTypeSupport.Bilingual.SdlXliff.XliffFileReader.ParseNext()
       w Sdl.FileTypeSupport.Framework.Integration.FileExtractor.ParseNext()
       w Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.ParseNext()
       w Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.Parse()
       w Sdl.ProjectApi.Implementation.TaskExecution.ContentProcessingTaskImplementation.TaskFileExecuter.Parse(String targetFilePath)]]></StackTrace>
      </Exception>
      <Environment>
        <ProductName>SDL Trados Studio</ProductName>
        <ProductVersion>15.0.0.0</ProductVersion>
        <EntryAssemblyFileVersion>15.1.3.55768</EntryAssemblyFileVersion>
        <OperatingSystem>Microsoft Windows 10 Pro</OperatingSystem>
        <ServicePack>NULL</ServicePack>
        <OperatingSystemLanguage>1045</OperatingSystemLanguage>
        <CodePage>1250</CodePage>
        <LoggedOnUser>EKHSURFACE\komin</LoggedOnUser>
        <DotNetFrameWork>4.0.30319.42000</DotNetFrameWork>
        <ComputerName>EKHSURFACE</ComputerName>
        <ConnectedToNetwork>True</ConnectedToNetwork>
        <PhysicalMemory>4117704 MB</PhysicalMemory>
      </Environment>
    </SDLErrorDetails>

    So, turned out my DeepL Plugin was indeed outdated (4.1 here). I reinstalled the plugin with the current release and... the error came up even earlier in the process! At least I didn't waste as much money this time. Find attached the error report.

    I'm not perfectly sure whether the plugin actually reinstalled. The Manager shows the present release (4.8.8), but the API key and all config was already there. It's probably some registry leftovers though.

    I was actually hoping this would work…

    By the way - can you show me your PDF conversion settings, Paul? I have the "recognize PDF text" set to None and Iris disabled, as it's obviously useless in real world.

  • Just retried after a PC reset (as means of showing good will on my part) and the error comes up on the same tag ID.

  • That error doesn't seem to have anything to do with the plugins.  It may be related to you not using IRIS... at a guess since you don't seem to realise this is also a plugin.

    https://appstore.sdl.com/language/app/iris-pdf-ocr-support-for-studio/794/

    You should have been advised you didn't have this installed if you set up your PDF filetype to use it?  My settings:

    Other settings:

    Perhaps your error is coming from the old default PDF converter?  I haven't tested that but might do later.  I tend to use IRIS by default.

  • I sent you the docx created on my machine and also the translated sdlxliff.

  • Thilo, you're claiming that Studio completes the pre-translation despite the occurrence of the error. But how does this look on your end, then? You launch pre-translation, an error comes up and...? "My" Studio always terminates pre-translation upon the occurrence of the error, so I don't see how it could be possibly different on your end.

    Unless you mean Studio leaving out some empty segments after completed DeepL-ing (without any error, and almost always at the beginning of the document, for some reason), which is a frequent phenomenon and which never really bothered me.

Reply
  • Thilo, you're claiming that Studio completes the pre-translation despite the occurrence of the error. But how does this look on your end, then? You launch pre-translation, an error comes up and...? "My" Studio always terminates pre-translation upon the occurrence of the error, so I don't see how it could be possibly different on your end.

    Unless you mean Studio leaving out some empty segments after completed DeepL-ing (without any error, and almost always at the beginning of the document, for some reason), which is a frequent phenomenon and which never really bothered me.

Children
No Data