When working with InDesign files, I often get odd segmentation if there are three or four dots like this... or like that....
What could I change in the segmentation settings to prevent this? In normal language I'd say "If a dot/period is followed by a dot/period, don't start a new segment." Of course, (as always) there are a few caveats, like in the screenshot: The segment break in segment 23 has to be after the ".
Who could give me an idea how to achieve this? (And: should that not be a standard setting?)
I don't think it's caused by the segmentation rules (then you would get the same weird segmentation in e.g. Word files too, not only in InDesign), but by the structure of the InDesign source.Try to open the IDML source in text editor and see what exactly in the XML...
Okay, it's not .... but periods with NBSPs and periods with normal spaces in between. In this case, the first three periods are separated by NBSPs, the last period is preceeded by a normal space
In Studio (I merged the segments manually):
Daniel Hug said:periods with NBSPs and periods with normal spaces in between. In this case, the first three periods are separated by NBSPs, the last period is preceeded by a normal space
...which is, BTW, typographically wrong ;-)There should be ellipsis, not three periods (nota bene separated by non-breaking spaces!).
So... all in all it's just about the old "garbage in, garbage out" rule - if there is a mess in the input, you can't really expect a proper and good result.
I'll have to do a few tests on this... see at which stage this can still be corrected, e.g. CleanUp Task or pre-process the icml files...
I agree they should use the ellipsis, but I have no or little influence on this. CleanUp Task only comes in after segmentation is done, so that alone can't fix this.
I would only pre-process these files if there was no alternative whatsoever.
So here is my solution:
Thank you again for your input.