On Saturday 25th Sept, we are completing planned maintenance work on the SDL/RWS Account platform. The outage is expected to last from Saturday 03:00 BST to Monday 27th Sept 11:00 BST. During this time you will not be able to access SDL accounts or activate / deactivate licences. Apologies for the inconvenience this causes.

Segmentation rule for bullets not working

Hi 

I am trying to create a segmentation rule that segments at the individual items of a bulleted list in an Excel cell. Unfortunately I cant't get it to work.

So, I have tried to create a small, simple test case with a Word document. My new segmentation rule does not work with this simple document either, so I must be doing something completely wrong.

I hope that some of the more skilled experts will be able to point me to what I am doing wrong.

I attach the sample doc here:

Test document for segmentation rules.docx

And here is the definition of my additional segmentation rule:

And here is what I get in the Editor:

FYI: the bullet I have inserted in the document is the one with Unicode 2022, I checked this. Even if I enter the bullet itself in the rule, it does not work.

Walter

  • Hello Walter,
    a backslash before u2022 seems to be missing...
    Kind regards
    Sébastien
  • Hi again,

    with some tests before...

    Actually, you only need to enter your bullet in the before break regex and a fullstop (any character) in the after break regex. I don't know the \p{... symbols well, are you sure you need them?

    Both "bullet" and "\u2022" work.

    Kind regards

    Sébastien

  • I agree with Sébastien, this is what my bullet rule looks like.
  • Hi Sébastien, hi Nora

    Many thanks for the hints. I got a bit confused with the "Basic View" and "Advanced View" of the segmentation rule.

    And to answer Sébastien's question about the \p{... part of the regex: this is automatically added by Studio if one enters the characters in the "Basic View".

    I tried your suggestion and Studio now does segment at the bullet, but the bullet shows up as a character at the end of the previous segment, see below:

    How can I avoid this?

    Walter

  • Hi Walter
    I would add a segmentation rule for soft return. Enter a soft return in before break regex and a full stop in after break regex. Then you get your bullets alone in a segment.
    Let Studio show you space, soft return and so on by clicking on the "paragraph" icon, it's easier to see what you need.
    , and you how would you do?

    Kind regards
    Sébastien
  • Hi Sébastien

    Thanks for the hint. Adding a seg rule for soft return works. And with a display filter set like this:

    I can hide the bullet only segments in the Editor (in case you wonder about the "\*", I extended the bullet rule to include also the asterisk because this this use case also has list items introduced by an asterisk).

    So far, this works fine now, not only on my simple Word sample, but also on the Excel document. However, I have another problem with the Excel, which is probably not at all related to the segmentation. When I preview the Excel or save as target, the target only contains text in the first 4 segments the rest of the document is empty. I'll now need to find out what causes this.

    Thanks a lot for the help on the segmentation rules.

    Walter

  • Hi Sébastien and Walter,

    A soft return rule as suggested by Sébastien is a good solution, with the added benefit that it will segment at other instances where there are soft returns.

    Another option to specifically take care of the bullets is a second bullet rule where you enter a full stop (indicating "any character") in the "Before break" regex box in Advanced View and the bullet character again in the "After break" regex box. This would create a new segment before every bullet and the result will be what Sébastien has described: each bullet in its own segment.
  • Hi Nora

    Thanks for the suggestion Nora.

    However, when I try this, I get the following:

    The bullet is now inside the segment, but at the end of it, which is not really what we want.

    Walter

  • Hi Walter,

    How strange, it looks like the rule to segment before a bullet is not being applied. Did you include both rules when processing the sample above? There needs to be one rule to segment before a bullet and another one to segment after a bullet.

    This is what I get with a small sample:

    This TM contains the rules I used: Bullet segmentation.zip

  • Hi Nora

    Sorry, I misunderstood what you meant with "each bullet in its own segment", I thought you meant that the bullet including the following text would be in a segment (together).

    The result you show in your screenshot had already been achieved with only one bullet rule plus a soft return rule, as suggested by Sébastien. This is OK for me now as I can then use a regex display filter to hide the "bullet only" segments.

    BTW: the problem with the mainly empty Excel target file was simply due to the fact that the font color was white (on white) and one could not see the text. A CTRL-A and applying "automatic color" solved the problem. Why this text appears in white remains a mystery to me as it is black in the source document.

    Thanks again for your help and have a nice weekend.
    Walter
  • Walter Blaser said:

    Sorry, I misunderstood what you meant with "each bullet in its own segment", I thought you meant that the bullet including the following text would be in a segment (together).

    Hi Walter,

    First, thanks for reporting back on the "invisible" text, I was meaning to ask if you had found a solution. It always helps to hear about these things in case it happens to us!

    Regarding the segmentation: you're right, I was only offering an alternative to what Sébastien suggested, but the result would be the same as what you had already achieved.

    By the way, for the benefit of others who may have a similar use case: in order to get the bullet + text on each segment, using either Sébastien's suggestion for the soft return segmentation or a "segment before bullet" segmentation rule would work, you wouldn't need the "segment after bullet" rule. In other words, in order to include the bullet with the text, you would only need this rule:

    Have a relaxing weekend!