On Saturday 25th Sept, we are completing planned maintenance work on the SDL/RWS Account platform. The outage is expected to last from Saturday 03:00 BST to Monday 27th Sept 11:00 BST. During this time you will not be able to access SDL accounts or activate / deactivate licences. Apologies for the inconvenience this causes.

How to add a segmentation rule to have Studio segment BEFORE every opening bracket

Hello. I want to create a new TM where Studio should segment BEFORE every opening bracket. I have tried playing with the segmentation rules, but I am having trouble since in most cases the segmentation happens after the break character, not before. I guess that the segmentation rule should look something like: Any character/digit before break, a space as the break character (which is not an option but  it may be possible to add it as regex in the Advanced display of the rule) and an opening bracket followed by any character/digit after break. I am new to regex. Any ideas would be much appreciated.  

Top Replies

Parents
  • Hi ,

    This segmentation rule should do the trick:

    • Before break: .
      (just a dot, which means “anything” in regex)
    • After break: \[
      (i. e., an opening bracket escaped)

  • Thank you for your suggestion Jesús. Unfortunately, it didn´t work for me. See example of text in my reply to Steven. I replaced [ with ( since in fact I meant parenthesis, not brackets. Thanks.

  • hi ,

    Once entered that segmentation rule in the TM, you need to delete the document from the project (Files view) and add it again to the project in order to re-segment the document with the new rules. Have you done that?

  • I created a new empty TM with that segmentation rule and then created the project afterwards with just that TM.

  • Hi ,

    As there not any more TMs, it should work. This is what I get from a 2-line TXT file:

    Could you please post a screen-shot of your TM segmentation rule re. the parenthesis. Ensure that any trailing spaces are removed from both regexes.

    If everything looks fine:

    Could you please post a second screen-shot of your project settings showing the TM?

    If you change/edit the segmentation rule, remember to delete the file in the Files view and then add it again.

  • Here are my screen shots and the results with the text in the image above. Thanks Jesús.


  • Could you please confirm that there isn't any space after the  dot or after \( regexes?

    It looks there is one space after \( from your screen-shot.

  • No spaces. Oh well. My Studio must be really stubborn or I´m doing something wrong somewhere. I don´t want to waste more of your time Jesús. Thank you.

  • It really peaked my curiosity!

    Could you please ZIP the whole project: SDLPROJ file along and the Tm, and source and target folders and post it here (Insert > Image/Video/File? And I'll have a look at it.

  • Sure. Here it the Project folder. I have included the TM inside. Thanks.Projet 4.zip 

  • Thanks ,

    The TM attached has the default segmentation rules:

    As you can see, there are no segmentation rule for parenthesis. So just guessing, you may have another TM with the same name but without the segmentation rule you needed.

    If I add the above mentioned segmentation rule to your TM:

    Then I get this as expected (of course, after deleting the DOCX from the Files view and adding it again):

    Please ensure you use the TM with the right segmentation rules in your project, delete the files in Files view and add them in order to be re-segmented. This should work!

  • Thank you Jesús. My apologies, I must have added the wrong TM to the project before sending it to you. I have tried again, followed step-by-step (at least I think) your instructions, and still the same problem. I have created a video to show you. I added the correct TM with the right segmentation rule to my project (in both SC and TG language though I know the TG was not necessary). At this point, I think it´s just better for me to give up. I know it should work but for whatever reason it doesn´t for me.

  • Hi ,

    I know what's going on!

    The Preview is a very nice feature to test your File Type settings, but it takes some default segmentation rules so your existing parentheses rule is bypassed. So please don't use the Preview to test segmentation rules.

    Yo need to delete the file in the Files view, add it again, prepare it, and open it in the Editor. You'll see then that it's correctly segmented.

  • Ok. So I created the project with the memory and the file. Once I created it, I went to the Files view and deleted all the files. I clicked on Project Settings to check and recheck that I only had the correct memory with the correct parentheses rule. Then I added the File again and it was still not correctly segmented. Did I miss something? In your message you say to add the file again, prepare it and open it in the Editor. What do you mean exactly by preparing it? When you add the file, all the batch tasks of your project template (Analysis, Pretranslation and so on) are done automatically so I don't see what else I should be doing.
    Just to clarify, you say that the Preview is not to be used to test segmentation rules. If I have understood correctly, no matter which memory I use for my project with whatever customized segmentation rules, these will never apply when I create a project. I would need to go to Files, delete all the files, add them again and then my customized segmentation rules will apply. Is that right? If does not make much sense to me. Thanks. 

  • Hi ,

    Open the project in Projects view.

    Ensure you've selected the Source language in the Files view.

    After adding the DOCX file (take into account that you'd need to delete it if it's already there).

    Ensure that the TM in the project has got the right segmentation rules.

    Then right click on the DOCX file and select Batch Tasks, and then select the Prepare item and finish the wizard:

    Then go to the Target language in Files view and open the file (SDLXLIFF extension) to check the segmentation in the Editor.

    Regarding your question about the TM and Preview, segmentation rules apply to the project of course, but you won't be able to test them with the Preview button under File Types. And yes, you need to open the file in the Editor to confirm the segmentation rules are fine. Notice that the Preview button is in File Types section, so it makes sense to skip any other previews. On the other hand, I wished the feature to test the segmentation rules were available somewhere while adding a segmentation rule.

Reply
  • Hi ,

    Open the project in Projects view.

    Ensure you've selected the Source language in the Files view.

    After adding the DOCX file (take into account that you'd need to delete it if it's already there).

    Ensure that the TM in the project has got the right segmentation rules.

    Then right click on the DOCX file and select Batch Tasks, and then select the Prepare item and finish the wizard:

    Then go to the Target language in Files view and open the file (SDLXLIFF extension) to check the segmentation in the Editor.

    Regarding your question about the TM and Preview, segmentation rules apply to the project of course, but you won't be able to test them with the Preview button under File Types. And yes, you need to open the file in the Editor to confirm the segmentation rules are fine. Notice that the Preview button is in File Types section, so it makes sense to skip any other previews. On the other hand, I wished the feature to test the segmentation rules were available somewhere while adding a segmentation rule.

Children
No Data