Segmentation rule for manual line breaks not working - seemingly due to other formatting tags? (Studio 2019, docx)

Hello,

I have a set of docx files that contain lots of manual line breaks (soft returns).

I've created a segmentation rule following these instructions, but it doesn't seem to be working.

After playing around, it would appear that the rule isn't working because the line breaks are preceded/followed by a formatting tag.

Here's the file (pseudo-translation for confidentiality):

If I strip all of the formatting from the file, the segmentation rule works correctly:

Stripping the formatting might be an acceptable workaround in some cases, but I can't do that here because I have to retain the source formatting in the target files I deliver.

Is there a way of redefining the Regex rule to make it work here?

Thanks,

Hayley

  • Try using \r\n for the rule and not just \n.  So it would look like this:

    My test file works with \n as well... but worth a try.

    If you are actually using these language resources to open the file then I think you need to share a file so we can test with your file, otherwise we're guessing.  You could use an appropriate part of your pseudo-translated file perhaps?

  • The plot thickens (or I'm going crazy).

    On the short pseudo file I used above, I can no longer produce what I sent you. It works with the \n rule (and also with the \r\n rule you suggested).

    But if I try to import a longer file (=extract of my real file, but with content replaced with xyzxyz), I get the error message "The document cannot be processed since it contains unexpected contents". I tried translating it as a single file with the same TM (and segmentation rule) as before and got the same error message.

    I'll attach the error message log and the docx file I tried to import:

    DOCX

    sdlerror-2020520-18h33m37s.sdlerror.xml
    <SDLErrorDetails time="20/05/2020 18:33:44">
      <ErrorMessage>The document cannot be processed since it contains unexpected contents.</ErrorMessage>
      <Exception>
        <Type>Sdl.LanguagePlatform.Core.LanguagePlatformException, Sdl.LanguagePlatform.Core, Version=1.6.0.0, Culture=neutral, PublicKeyToken=c28cdb26c445c888</Type>
        <HelpLink />
        <Source>Sdl.LanguagePlatform.TranslationMemoryTools</Source>
        <HResult>-2146233088</HResult>
        <StackTrace><![CDATA[   at Sdl.LanguagePlatform.TranslationMemoryTools.LinguaSegmentBuilder.VisitSegment(ISegment segment)
       at Sdl.FileTypeSupport.Framework.Bilingual.Segment.AcceptVisitor(IMarkupDataVisitor visitor)
       at Sdl.LanguagePlatform.TranslationMemoryTools.LinguaSegmentBuilder.VisitChildNodes(IAbstractMarkupDataContainer container)
       at Sdl.LanguagePlatform.TranslationMemoryTools.TUConverter.AppendToLinguaSegment(IAbstractMarkupDataContainer data, Segment result, LinguaTuBuilderSettings flags, List`1& tagAssociations, List`1& textAssociations)
       at Sdl.LanguagePlatform.TranslationMemoryTools.TUConverter.BuildLinguaSegmentInternal(CultureInfo culture, IAbstractMarkupDataContainer segment, LinguaTuBuilderSettings settings, Boolean& hasTrackChanges, List`1& tagAssociations, List`1& textAssociations)
       at Sdl.LanguagePlatform.TranslationMemoryTools.TUConverter.BuildLinguaSegment(CultureInfo culture, ISegment segment, Boolean includeTrackChanges)
       at Sdl.TranslationStudio.Editor.RepetitionTableUpdater.GetSegmentIdentityHash(ISegmentPair segmentPair)
       at Sdl.TranslationStudio.Editor.RepetitionTableUpdater.ProcessSegment(ISegmentPair segmentPair)
       at Sdl.TranslationStudio.Editor.TranslationEditor.Processors.RepetitionProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.BilingualContentHandlerAdapter.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Filters.Processors.SegmentRenumberingBilingualProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.BilingualContentHandlerAdapter.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Framework.Integration.AbstractBilingualProcessorContainer.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Framework.Integration.LocationMarkerLocator.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Filters.Processors.RegexEmbeddedBilingualProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Framework.Integration.AbstractBilingualProcessorContainer.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Framework.Integration.FileExtractor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Word.Services.ParagraphUnitOutputService.DirectOutput(IParagraphUnit paragraphUnit)
       at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Word.Services.ParagraphUnitOutputService.Output()
       at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Word.Routes.Parser.ParagraphRoute.Handle(Entity entity)
       at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Word.Routes.Parser.WordDispatchRoute.Handle(Entity entity)
       at lambda_method(Closure , IMessage )
       at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Core.Infrastructure.Dispatcher.Publish(IMessage message)
       at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Core.Services.ParserService.Publish(Entity parsedEntity)
       at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Core.Services.ParserService.Parse()
       at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Core.Parser.DocumentParser.Parse()
       at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Word.Parser.DocxParser.ParseNext()
       at Sdl.FileTypeSupport.Framework.Integration.FileExtractor.ParseNext()
       at Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.ParseNext()
       at Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.Parse()
       at Sdl.TranslationStudio.Editor.TranslationEditor.TranslatableDocument.Load(IJobExecutionContext context)
       at Sdl.Desktop.Platform.Services.JobRequest.Execute(IJobExecutionContext context)
       at Sdl.Desktop.Platform.Implementation.Services.Job.<_worker_DoWork>b__47_0()
       at Sdl.Desktop.Logger.Log.Resources(Object message, Action action)
       at Sdl.Desktop.Platform.Implementation.Services.Job._worker_DoWork(Object sender, DoWorkEventArgs e)
       at System.ComponentModel.BackgroundWorker.OnDoWork(DoWorkEventArgs e)
       at System.ComponentModel.BackgroundWorker.WorkerThreadStart(Object argument)]]></StackTrace>
      </Exception>
      <Environment>
        <ProductName>SDL Trados Studio</ProductName>
        <ProductVersion>15.0.0.0</ProductVersion>
        <EntryAssemblyFileVersion>15.2.6.2831</EntryAssemblyFileVersion>
        <OperatingSystem>Microsoft Windows 10 Home</OperatingSystem>
        <ServicePack>NULL</ServicePack>
        <OperatingSystemLanguage>1036</OperatingSystemLanguage>
        <CodePage>1252</CodePage>
        <LoggedOnUser>DESKTOP-LEVAH\Hayley Leva</LoggedOnUser>
        <DotNetFrameWork>4.0.30319.42000</DotNetFrameWork>
        <ComputerName>DESKTOP-LEVAH</ComputerName>
        <ConnectedToNetwork>True</ConnectedToNetwork>
        <PhysicalMemory>8261644 MB</PhysicalMemory>
      </Environment>
    </SDLErrorDetails>

    Hayley

  • It opens fine without a TM or with a TM that uses default settings.  Seems there is something Studio doesn't like about the segmentation rules.  I'll log it with support for investigation.

  • Hi  

    Yes, that's what I found too. It opens fine if I don't define a segmentation rule. And it also opens fine with the segmentation rule if I strip the formatting from the docx file first (file attached), hence my initial suspicion about the formatting somehow conflicting with the segmentation rule:

    DOCX

    Hayley