# Segmentation rule for manual line breaks not working - seemingly due to other formatting tags? (Studio 2019, docx)

Hello,

I have a set of docx files that contain lots of manual line breaks (soft returns).

I've created a segmentation rule following these instructions, but it doesn't seem to be working.

After playing around, it would appear that the rule isn't working because the line breaks are preceded/followed by a formatting tag.

Here's the file (pseudo-translation for confidentiality):

If I strip all of the formatting from the file, the segmentation rule works correctly:

Stripping the formatting might be an acceptable workaround in some cases, but I can't do that here because I have to retain the source formatting in the target files I deliver.

Is there a way of redefining the Regex rule to make it work here?

Thanks,

Hayley

• Try using \r\n for the rule and not just \n.  So it would look like this:

My test file works with \n as well... but worth a try.

If you are actually using these language resources to open the file then I think you need to share a file so we can test with your file, otherwise we're guessing.  You could use an appropriate part of your pseudo-translated file perhaps?

• The plot thickens (or I'm going crazy).

On the short pseudo file I used above, I can no longer produce what I sent you. It works with the \n rule (and also with the \r\n rule you suggested).

But if I try to import a longer file (=extract of my real file, but with content replaced with xyzxyz), I get the error message "The document cannot be processed since it contains unexpected contents". I tried translating it as a single file with the same TM (and segmentation rule) as before and got the same error message.

I'll attach the error message log and the docx file I tried to import:

DOCX

sdlerror-2020520-18h33m37s.sdlerror.xml
```<SDLErrorDetails time="20/05/2020 18:33:44">
<ErrorMessage>The document cannot be processed since it contains unexpected contents.</ErrorMessage>
<Exception>
<Type>Sdl.LanguagePlatform.Core.LanguagePlatformException, Sdl.LanguagePlatform.Core, Version=1.6.0.0, Culture=neutral, PublicKeyToken=c28cdb26c445c888</Type>
<HelpLink />
<Source>Sdl.LanguagePlatform.TranslationMemoryTools</Source>
<HResult>-2146233088</HResult>
<StackTrace><![CDATA[   at Sdl.LanguagePlatform.TranslationMemoryTools.LinguaSegmentBuilder.VisitSegment(ISegment segment)
at Sdl.FileTypeSupport.Framework.Bilingual.Segment.AcceptVisitor(IMarkupDataVisitor visitor)
at Sdl.LanguagePlatform.TranslationMemoryTools.LinguaSegmentBuilder.VisitChildNodes(IAbstractMarkupDataContainer container)
at Sdl.LanguagePlatform.TranslationMemoryTools.TUConverter.AppendToLinguaSegment(IAbstractMarkupDataContainer data, Segment result, LinguaTuBuilderSettings flags, List`1& tagAssociations, List`1& textAssociations)
at Sdl.LanguagePlatform.TranslationMemoryTools.TUConverter.BuildLinguaSegmentInternal(CultureInfo culture, IAbstractMarkupDataContainer segment, LinguaTuBuilderSettings settings, Boolean& hasTrackChanges, List`1& tagAssociations, List`1& textAssociations)
at Sdl.LanguagePlatform.TranslationMemoryTools.TUConverter.BuildLinguaSegment(CultureInfo culture, ISegment segment, Boolean includeTrackChanges)
at Sdl.TranslationStudio.Editor.RepetitionTableUpdater.GetSegmentIdentityHash(ISegmentPair segmentPair)
at Sdl.TranslationStudio.Editor.RepetitionTableUpdater.ProcessSegment(ISegmentPair segmentPair)
at Sdl.TranslationStudio.Editor.TranslationEditor.Processors.RepetitionProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.BilingualContentHandlerAdapter.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Filters.Processors.SegmentRenumberingBilingualProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Framework.Core.Utilities.BilingualApi.BilingualContentHandlerAdapter.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Framework.Integration.AbstractBilingualProcessorContainer.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Framework.Integration.LocationMarkerLocator.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Filters.Processors.RegexEmbeddedBilingualProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Framework.BilingualApi.AbstractBilingualContentProcessor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Framework.Integration.AbstractBilingualProcessorContainer.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Framework.Integration.FileExtractor.ProcessParagraphUnit(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Word.Services.ParagraphUnitOutputService.DirectOutput(IParagraphUnit paragraphUnit)
at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Word.Services.ParagraphUnitOutputService.Output()
at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Word.Routes.Parser.ParagraphRoute.Handle(Entity entity)
at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Word.Routes.Parser.WordDispatchRoute.Handle(Entity entity)
at lambda_method(Closure , IMessage )
at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Core.Infrastructure.Dispatcher.Publish(IMessage message)
at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Core.Services.ParserService.Publish(Entity parsedEntity)
at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Core.Services.ParserService.Parse()
at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Core.Parser.DocumentParser.Parse()
at Sdl.FileTypeSupport.Filters.MicrosoftOffice.Word.Parser.DocxParser.ParseNext()
at Sdl.FileTypeSupport.Framework.Integration.FileExtractor.ParseNext()
at Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.ParseNext()
at Sdl.FileTypeSupport.Framework.Integration.MultiFileConverter.Parse()
at Sdl.TranslationStudio.Editor.TranslationEditor.TranslatableDocument.Load(IJobExecutionContext context)
at Sdl.Desktop.Platform.Services.JobRequest.Execute(IJobExecutionContext context)
at Sdl.Desktop.Platform.Implementation.Services.Job.<_worker_DoWork>b__47_0()
at Sdl.Desktop.Logger.Log.Resources(Object message, Action action)
at Sdl.Desktop.Platform.Implementation.Services.Job._worker_DoWork(Object sender, DoWorkEventArgs e)
at System.ComponentModel.BackgroundWorker.OnDoWork(DoWorkEventArgs e)
at System.ComponentModel.BackgroundWorker.WorkerThreadStart(Object argument)]]></StackTrace>
</Exception>
<Environment>
<ProductName>SDL Trados Studio</ProductName>
<ProductVersion>15.0.0.0</ProductVersion>
<EntryAssemblyFileVersion>15.2.6.2831</EntryAssemblyFileVersion>
<OperatingSystem>Microsoft Windows 10 Home</OperatingSystem>
<ServicePack>NULL</ServicePack>
<OperatingSystemLanguage>1036</OperatingSystemLanguage>
<CodePage>1252</CodePage>
<LoggedOnUser>DESKTOP-LEVAH\Hayley Leva</LoggedOnUser>
<DotNetFrameWork>4.0.30319.42000</DotNetFrameWork>
<ComputerName>DESKTOP-LEVAH</ComputerName>
<ConnectedToNetwork>True</ConnectedToNetwork>
<PhysicalMemory>8261644 MB</PhysicalMemory>
</Environment>
</SDLErrorDetails>```

Hayley

• It opens fine without a TM or with a TM that uses default settings.  Seems there is something Studio doesn't like about the segmentation rules.  I'll log it with support for investigation.

• Yes, that's what I found too. It opens fine if I don't define a segmentation rule. And it also opens fine with the segmentation rule if I strip the formatting from the docx file first (file attached), hence my initial suspicion about the formatting somehow conflicting with the segmentation rule:

DOCX

Hayley