How to exclude placeholders enclosed in single or double curly brackets in HTML files?

I'm trying to define text enclosed in single or double curly brackets in an HTML file as placeholders. For example.

<p>Do not translate this {{ variable }} and don't translate that { variable } either.</p>


The following expression works fine Notepad++, but it doesn't work in SDL Studio.

\{+[^}]+\}+


1. Why doesn't the expression work?

1. What regex flavor(s) does SDL Studio support?

  • Perhaps you should start by explaining where you are doing this?  So how are you adding this expression to support this in an HTML file?

  • I created a new HTML based file type. I then selected Embedded Content.
    In the Embedded Content dialog box, I enabled Other elements identified by and selected the Structure information option.
    I then clicked Configure > Add, selected Placeholder and entered the regex

  • The information missing is how you set up the embedded content rule as this provides Studio with the information it needs so it applies your rule in the right context.  For example.... I created this:

    The variables are located in three different elements (h1, p and time).  So for this to work I need to specify the following in the html filetype:

    This is of curse assuming the variables are in elements like this.  If they are within server side scripts then you need to use the appropriate "Server side scripts" option.

    Perhaps this helps?  For my simple test file it works just fine:

  • Thanks for your quick and very detailed reply! Unfortunately, I was wrong about the {} variables.
    They only occur outside of html tags. For example:

    {% if product.available %}
    <h2>Price: $99.99</h2>
    {% else %}
    <h2 class="sold-out">Sorry, this product is sold out.</h2>
    {% endif %}

    How can I mark them as placeholders in this case?

  • Please can you provide more info here.  It would be really helpful if you provided sample html that was complete and represented the overall structure of the file.  Then we don't have to guess ;-)

  • The file is apparently a Shopify Liquid template file. Here's an example taken from the official website:

    https://shopify.github.io/liquid-code-examples/example/call-to-action

    {%- if cart.item_count > 0 -%}
    
    <form action="/cart" method="post">
    
      {%- for item in cart.items -%}
        <a href="{{ item.url | within: collections.all }}">
          <img src="{{ item | img_url: '200x200' }}" alt="{{ item.image.alt | escape }}">
          {{ item.product.title }}
        </a>
    
        {%- unless item.variant.title contains 'Default' -%}
          <p>{{ item.variant.title }}</p>
        {%- endunless -%}
    
        {%- assign property_size = item.properties | size -%}
        {%- if property_size > 0 -%}
          <ul>
    
            {%- for p in item.properties -%}
              {%- assign first_character_in_key = p.first | truncate: 1, '' -%}
              {%- unless p.last == blank or first_character_in_key == '_' -%}
                <li>
                  {{ p.first }}:
    
                  {%- if p.last contains '/uploads/' -%}
                    <a href="{{ p.last }}">{{ p.last | split: '/' | last }}</a>
                  {%- else -%}
                    {{ p.last }}
                  {%- endif -%}
    
                </li>
              {%- endunless -%}
            {%- endfor -%}
    
          </ul>
        {%- endif -%}
    
        <p>
          <a aria-label="Remove {{ item.variant.title }}" href="/cart/change?line={{ forloop.index }}&amp;quantity=0">Remove</a>
        </p>
      {%- endfor -%}
    
      <input type="submit" name="checkout" value="Checkout">
    </form>
    
    {%- else -%}
      <p>The cart is empty. <a href="/collections/all">Continue shopping</a></p>
    {%- endif -%}


  • The same approach is required, but you need to know what the parent element would be for the script.  You don't show this in your example.  But this could be problematic... for example.  Assume I have this:

    regex - shopify.html
    <!DOCTYPE html>
    <html>
    <body>
    
    <h1>Testing {variables} in html</h1>
    
    <div>
      <h2>{{ section.settings.text-box }}</h2>
    
      <a href="{{ section.settings.link }}">
        {{ section.settings.linktext }}
      </a>
    </div>
    
    {% schema %}
    {
      "name": "Call to action",
      "settings": [
        {
          "id": "text-box",
          "type": "text",
          "label": "Heading",
          "default": "Title"
        },
        {
          "id": "link",
          "type": "url",
          "label": "Link URL"
        },
        {
          "id": "linktext",
          "type": "text",
          "label": "Link text",
          "default": "Click here"
        }
      ]
      ,
      "presets": [
        {
          "name": "Call to Action",
          "category": "Promotional"
        }
      ]
    }
    {% endschema %}
    
    <p>Do not translate this {{ variable }} and don't translate that { variable } either.</p>
    
    <time>Testing more  {{variables}} inside {variable} html.</time>
    
    </body>
    </html>
    
    
    
    

    The shopify json (I know you don't have this...) is under the body element in my made up example.  So I can create an embedded parser to handle it and do this:

    This will get me this:

    However, since this is at the body level nothing else is parsed... and this is clearly not helpful.  I can't see how to get at only this script in this location without the ability to be more specific with the parser rule condition path.  Maybe it's just my lack of knowledge so I will investigate this more... but so far it's a problem.

    If your actual file has the variables at the body level of the html then you'll have the same problem.  But what you could do is this:

    1. translate the parts you need in the script.

    2. Save the target file

    3. remove the rule and translate the target file in the normal way with just the html file

    So two passes, but it would work.

  • Thanks for your reply, but the method that you suggested is rather cumbersome.
    Is there really no way to tell SDL Studio to treat all text enclosed in {} or {{}} in HTML files as placeholders no matter where the text occurs in the HTML file?

  • As I just explained there is.  But as you have still not provided me with a file that shows clearly where that script sits I actually don't know whether it's going to  work or not.  It cannot possibly be outside of all elements... so which element is it in?

    I created an example as I showed you above where I placed it in the body element (because you have not told me where it should go) and in there it seems to have the effect of overruling all other rules.  But if you comeback and tell me that actually it's not in there then maybe there is some way of tackling this.

    But to answer this:

    Is there really no way to tell SDL Studio to treat all text enclosed in {} or {{}} in HTML files as placeholders no matter where the text occurs in the HTML file?

    Of course there is.  You can create your own regex based filetype that just treats the entire file as text.  Now you can do whatever you like.

    But if you want to use html as the basis of this then you need to follow some rules and we can only validate those rules if you actually give us a real file that is complete instead of these partial snippets that force us to keep guessing.

    but the method that you suggested is rather cumbersome.

    Perhaps... but maybe still preferable to the alternatives.

    You could also hire a developer and create your own filetype specifically for handling Shopify files.

  • Thanks for looking into this!

    It looks like I'll have to create a regex based file type, which is a bummer, because the file is more than 70% HTML.

    FYI, the code in curly braces occurred before and after the starting <body> tag as well as after the closing </body> tag. I.e., it'd be impossible to define tag based rules.

    That's why I was asking whether there's a possibility to globally mark this tags as placeholders.

  • ok

    It is difficult because the html parser is based on looking at files that are ordered with text always coming inside the elements.

    The embedded content parser looks at what's inside the elements and treats them as text which is what allows you to handle your variables.

    If you now inject script outside of these elements then you can only define an embedded processor to handle the content between the highest parent element... in your case the <html> element by the sounds f it.  So in effect you are now saying treat the entire file as text and you'll define the rules for everything.

    This will be quite tricky but if you're happy to handle everything as placeables it's easy enough:

    Gets you this:

    It's not too bad because structural elements will be moved out and you won't make a mistake.  Inline tags, which I don't have in this example will be trickier since they won't be handled as opening and closing pairs and the translator will have to be very diligent to make sure they are placed in the right order.

    Frankly, I think that if this type of file is something we're going to be seeing more and more of then it would be worth developing a filetype to handle this sort of embedded content in a different way.  Perhaps you should raise this here and see if anyone else thinks it's a good idea:

    http://ideas.sdl.com

  • Thanks for all your help! I really appreciate it.
    If we get these files more often, I'll definitely raise this at http://ideas.sdl.com.

  • Although only indirectly related to this question, it would be a good "idea" to extend file tagging to file types other than TXT. I will present the request in the specified link.  

  • When you do that can you elaborate a little on why you want to do this and for which filetypes?

  • I often need to translate software manuals that contain many "tag" names (usually written in CamelCase or some variant). It would desirable to convert these to tags and so prevent them from being translated, and more importantly, exclude them from the Spell Checker (I have had cases of hundreds of false positives). Principally MS Word and Excel.