CAFD regex to find quotation marks inside tags (and replace/move outside if possible)

Hi all

I'm reviewing an idml document where a lot of quotation marks have incorrectly been placed inside bold or italic tags when they should be outside.

Is anyone out there clever enough to have come up with a regex I could use in the Community Advanced Display Filter to find these instances? I suspect there would be no automatic way to replace them outside the tags but at least I could then filter on the relevant segments.

I'm using Studio 2017.

Many thanks in advance.

Cat

  • From what I know it is not possible within Studio. You can, however, do that most probably in Notepad++. Inside the SDLXLIFF the text will look more or less like this

    Text with <tag>„Quotation mark”</tag> within tags.

    Search for (<[^>]*?>)„(\w)
    Replace with „\1\2

    Search for (\w)”</[^>]*?>)
    Replace with \1\2”

    This problem shows me again, how important it is for translators to understand the tool they are using. Such problems result from their reluctance to learn, why where which tags appear and how to deal with these. Unfortunately, the answer of many if not most in such cases is "We are not IT specialists"...

  • Hi Jerzy

    Many thanks for this, I'll try it with a copy of the file and report back.

    I agree with you about knowing your tools and that a lot of translators make the "IT specialist" excuse and I try hard not to be one of them. I have certain colleagues who use me as their IT helpdesk. I'm learning regex at the moment but I'm still on the basics. I do make extensive use of tools like the CADF which is a great help.

  • Before you try, locate a part of text, where the problem occurs and simply observe, how this performs. Might be you need to modify the regex. To learn regex, I would recommend the Regex Buddy, a small program, where you can test your regex and also let the program explain it.

  • Thank you for the tips! I've downloaded ++ (I only had the basic notepad so no regex support) and will give it a go later after trying it out in Regex Buddy.

  • Hi Jerzy, in the end there was so much wrong with the translation and the target tagging that I had to replace most of the tags anyway so I corrected the quotation marks in the tag-correction process, otherwise I would have missed my deadline.

    HOWEVER... having delivered the job I played around with this for several hours today using a copy of the unrevised xliff in Notepad++ and used Regex Buddy to test the expressions.

    I really learned a lot because the quotation marks were a mix of smart and straight, some were between two tags and others not, some had an incorrect space between the tag and the word ... well, you get the picture! It probably wasn't possible to replace them all at once even using OR conditions. So I worked until I figured out each different case successfully.

    Thank you so much again for your help. Not just for the regex but also Notepad++ because I've used Notepad in the past for other, simpler fixes, but to my shame I wasn't aware of ++.

    I'm looking forward to advancing my skills and will keep this file for future regex training since it was such a mess!

    All the best

    Cat