How do I set up the Document structure for an .md file?

I actually posted on another thread, but I figured maybe I'd get more help if I started my own post. I'm trying to set up a new file type for an .md file. I'm a veteran Trados user, but new to file types and regex, so forgive my extremely basic question, but how do I configure the Document structure so that Trados knows what is translatable text? I've read Paul's post about the inline tags and I think I might be able to figure those out (I'm sure I'll be back if I can't), but I can't even get Trados to display any text at all if I attempt to process the file. 

  • Hi Beatriz,

    We need to see a sample of the file to answer that one.
  • Hi Beatriz,

    Just to follow up on my short response from my phone. The reason we need to see a sample is because of the following:

    1. Whether something is translatable or not is not related to the document structure (unless this is embedded content?)
    2. How you handle this could depend on whether it's a text file format, xml file, html file etc.

    So if you can provide a small sample it would be very helpful. You can also email me the file if you like?

    Regards

    Paul
    pfilkin@sdl.com
  • Hi Beatriz,

    Doh... completely forgot until I read your other post that *.md was a markdown filetype! Sorry!

    However, it would still be useful to have a sample. The idea here is that the regex filetype will take everything as translatable, and then you specific which parts you don't want to be translated by using regular expressions.

    Make sense?

    Regards

    Paul
  • Paul,

    Thanks so much! These *.md files are for a large website, so we have hundreds (I think it's over 400). I will email you 2 or 3 different ones so you can see the structure. What you say about the regex filetype making everything translatable makes perfect sense, but when I try to process it, it marks the file as "reference" only. It won't let me switch it to translatable. As a test, I put in one or two regex rules under "Document structure" and it finally changed the file to "translatable" and processed it... except I got 0 words in the analysis and no segments appeared. So I am at a loss as to what I'm doing wrong. Any help from you would be fantastic. I'd love to understand how this works.

    Beatriz
  • So, I actually did some more testing on this file and tried processing it again. It turns out that the reason why nothing was showing in the Editor view is because I had processed the file with only *.md as the "File dialog wildcard expression". I tried again with both *.txt and *.md in that field and it processed it. Is this the way you're supposed to do it? Is it just a matter now of adding the non translatable inline tag rules?
  • Hi again, Paul. I have two questions regarding regex rules. One, I have a document (which I sent to you) that has pipes (|) and dashes (-) to mark a table. I'd like to make the pipe and the dashes placeholders so that they don't appear as translatable. I created the rule \| and that successfully marked the pipes as tags. Is there any way to make it so that the pipes don't show up in the document at all and the segments break there instead? For example, if I have Column | Column, is there a way to make it so that each "Column" is its own segment and there's no pipe to be seen?

    My second question is as follows: my document uses ``` and ``` as code. I need to make it so that everything in between ``` and ```, including ```, is non translatable. I don't think my knowledge of regex is nearly advanced enough to write that one from scratch. Any ideas?
  • Actually, you can disregard the first question. I tweaked the Advanced settings a little and figured it out. The second question still has me stumped, though.
  • How about this as a placeholder?

    ```.*?```

    I'm doing this blind though so there could be more complexity to this that I can't see without seeing the file.

    Regards

    Paul
  • Unfortunately, that one didn't work either. You actually have the document. It was one of the 4 I sent you earlier today. The name is LQA.md. Let me know if you don't have it and I'll resend.
  • ok - see it now, sorry.

    This does work, but not in Studio. The reason for this is that one of the problems you have with this file is that every line has a hard return at the end, so they are all separate segments. This means you need a few rules, like this for example, all as placeholders:

    ```
    START-OF-\w+
    \w+=(\w+|<\w+>\.\w+)
    [A-Z]{2}\d{3}
    END-OF-\w+
    \w+\|\w+\|\w+\|\w+\|\d+\.\d+\|

    And maybe whatever else is needed in your other files.

    Regards

    Paul

  • Thanks so much! I used your suggestion and it worked. However, I realize that we have so many different files that contain different strings of text in between the ```, that it would be a monumental task to tag it all individually. I checked with a colleague of mine who knows more about regex than I do and she suggested the following expression, which takes into account line breaks:

    ```\n*(.*\n)*```

    It did the trick. I'll have to test it out with other files, though. I'm sharing it here in case anyone else has the same question in the future. Thanks for all the help so far!
  • Hi Beatriz,

    I have no idea how that would work? Studio segments the lines on the paragraph marks in the file so each line of code is in a separate segment. The regex rule then only applies within each segment.

    I think what you might be able to do is create structure rules instead of the inline tags and this could prevent any of the code between these characters from being parsed at all.

    Regards

    Paul
  • Hi Paul,

    Yeah, I think you're right about that. I removed the rule and it didn't change anything. I probably marked it as non translatable with a different rule. I notice it didn't work with another file that had the same structure.

    How would I go about creating structure rules? According to the instructions on Trados, the structure rules are to mark the text that is translatable. I am guessing I would have to create a rule indicating the text is transalable from start (^) to ``` and then from ``` to end ($)? I'm sure it's not that easy! :) There's probably a more complicated regex for that.

    Beatriz
  • Hi Beatriz,

    You've got it. That's exactly how you'd have to tackle it. I was actually just sitting here wishing we had a mechanism for marking up the structure you did not want translated which feels easier!!

    It's going to take a couple of rules I think as opposed to just one, so I'll have a play and if I can't do it will ask the filetype developer tomorrow if it's even possible. In which case the first approach I gave you with a library of rules you continually build on is going to be the best approach.

    Regards

    Paul
  • Thank you so much! If it's possible, it would be the easiest solution here. I think that's really the biggest thing that needs to be hidden from these files. Everything else is just a matter of tags and placeholders that follow pretty straightforward regex rules.

    Btw, do you think SDL is eventually going to incorporate this file type into its default selections? I spoke to a programmer involved in this project yesterday and he indicated that *.md files are becoming more popular in web development and that we are likely to see more of these in the future.