I would like to ask the below questions.
Hi samar magdy
A TMX file doesn't hold segmentation rules.
If you want to use paragraph segmentation for all future files then creating a new TM with this option will work for all future files. But if you import your TMX into this new paragraph based TM the segments will still only be sentence based as these are already defined in the TMX. I'm not aware of any tools that can go from sentence to paragraph... only a few that go the other way around. Part of the problem I guess is that a TM is not a true reflection of the original documents so making sure the paragraphs were really correct would be tricky if not impossible and technically the TMX puts all segments, whether sentence based or paragraph based into a single TU. So there is nothing in the TMX to tell any tool whether the TUs were part of a larger entity or not.
What may be useful if is that if you do this the fragment matching feature can pick out the TUs. So whilst you won't get proper pretranslation leverage at least you would still be able to leverage the work interactively:
Once you have converted your bilingual file that's it. You can't change the segmentation at this point, you need the source file for that. Perhaps a potential solution would be to align the source and target files with a TM set up for paragraph based segmentation instead of trying to change the bilingual files... although I wouldn't hold my breath!
If there is a solution for this out there I'd also be interested to learn.
Thank you for your replay.
I would like to ask you a question. While trying to create SDL Project with a new Paragraph based TM, The new created file is not segmented with Paragraph as per the below is a screenshots.
I have expected that each highlighted paragraph will presented in only one segment in studio but this didn't happen.and the text is also segmented by full stop.
sorry for interrupting you. but i have other question in the segmentation. i have tried to create win align project by creating TM with a para segmentation and adding the source and target file but the bilingual file is segmented by sentence not paragraph as per the below screenshots.
This is a bit of a tricky problem because only the source language will be segmented correctly. There is no way to segment the target based on a target TM as you can only set a TM to act on the source. So you have to do this sort of thing when you align:
The source gets segmented correctly, by paragraph, but the target will be by sentence as the default rules are used for target whether you specify a TM or not.
Paul said:This is a bit of a tricky problem because only the source language will be segmented correctly. There is no way to segment the target based on a target TM as you can only set a TM to act on the source.
You can set segmentation rules for BOTH the source and target language in TM.
And if I remember correctly, the last time I experimented with paragraph-based segmentation in alignment (autumn 2017), it worked pretty well.
It could be that the source format played a big role in my case... it was MadCap Flare XML/HTML, so the segmentation is pretty much defined by the file type, rather than the TM-defined rules.
Of course I cannot be expected to know how Studio works internally... all I know is that I:
- created new empty TM where I changed the segmentation to Paragraph based for both source- and target language
- used this TM for running the alignment
That's all. I don't (and can't) know exactly which "magic" (or coincidence) made it to align just as one would expect ;-). Perhaps is the internal "reversed" TM created by reversing the actual TM (similarly to what AnyTM does)? It would quite make sense...
I didn't explore it any deeper as we ended up not going further with paragraph-based segmentation and went the harder way of sentence-based segmentation.
Hi Evzen Polenka
I would never of thought of doing that as I usually select an existing TM and it's too late at this point. But you are absolutely right... and I'm really happy to see this:
Thank you for sharing this information.... something we should definitely document somewhere as I'm sure it will be useful to many users. Or maybe I was the only one who didn't know this!!
Paul said:Try it with any other kind of segmentation rules and the effect is mindblowing... at least I'm really struggling to see any logic in how this works.
Hmmmm... just out of curiosity, you are testing it with the latest version, I suppose... the one with god-knows-how broken segmentation rules...Would you mind doing same tests with "last (kind-of)sensibly-behaving version", i.e. 2017 CU5 (last pre-SR1) and 2015 SR3 (vanilla, w/o CUs)?I feel that it may behave differently...