SDL Trados Studio
SDL Trados GroupShare
SDL Trados Business Manager
SDL Trados Live
SDL MultiTerm
SDL Passolo
SDL Speech to Text
SDL Managed Translation - Enterprise
SDL MultiTrans
SDL TMS
SDL WorldServer
Translation Management Connectors
SDL LiveContent S1000D
SDL Contenta S1000D
SDL XPP
SDL Tridion Docs
SDL Tridion Sites
SDL Content Assistant
SDL Machine Translation Cloud
SDL Machine Translation Connectors
SDL Machine Translation Edge
Language Developers
Tridion Developers
Tridion Docs Developers
Xopus Developers
Community Help
SDL User Experience
Language Products - GCS Internal Community
SDL Community Internal Group
SDL Access Customer Portal
SDL Professional Services
SDL Training & Certification
Style Guides
Language Technology Partner Group
SDL Academic Partners
SDL Enterprise Technology Partners
XyUser Group
ETUG (European Trados User Group) Public Information
Machine Translation User Group
Nordic SDL Tridion Docs User Group
SDL Tridion UK Meetup
SDL Tridion User Group New England
SDL Tridion West Coast User Group
SDL WorldServer User Group
Tridion Docs Europe & APAC User Group
Tridion User Group Benelux
Tridion User Group Ohio Valley
SDL MultiTerm Ideas
SDL Passolo Ideas
SDL Trados GroupShare Ideas
SDL Trados Studio Ideas
SDL Machine Translation Cloud Ideas
SDL Machine Translation Edge Ideas
SDL Language Cloud TMS Ideas
SDL Language Cloud Terminology Ideas
SDL Language Cloud Online Editor Ideas
SDL Managed Translation - Enterprise Ideas
SDL TMS Ideas
SDL WorldServer Ideas
SDL Tridion Docs Ideas
SDL Tridion Sites Ideas
SDL LiveContent S1000D Ideas
SDL XPP Ideas
Events & Webinars
To SDL Documentation
To SDL Support
What's New in SDL
Detecting language please wait for.......
I'm currently retranslating a text that a non-translator colleague had partially translated, and I noticed an abundance of double genitives in both source text (DE) and target text (EN) (rendering of German "des/der ... des/der ..." formulations with "of ... of ...") and in wanting to understand more about Regex was wondering if there might be a quick and dirty Regex-based QA check that could be done to trap these kinds of double genitives.
Does anyone have any suggestions on this?
Michael Bailey
At a very simplistic level perhaps something like this would suffice?
\bof\b.*(?=(?:\bof\b))
You could use this as a QA check in the target for example, or even in the display filter to…
Your question is interesting because it is also one of the examples in my book.You might want to refine the regex to match only occurrences of "of the" (and "of") near each other, such as…
Why the asterisk after the first space?
Shouldn’t it be
\bof the(\s\w+){0,2}\s+of the\b
instead?
Can you provide a couple of sample source and target sentences you need to catch?
It seems to be particularly prominent in titles and positions (e.g. in tabular reports of names and positions)
Example 1:
Example 2:
Similarly I have a very lengthy list of titles, where some have been been translated as A of B of C (C is usually a company name)
Sometimes it starts getting silly.
You could use this as a QA check in the target for example, or even in the display filter to filter all the segments that do this.
Your question is interesting because it is also one of the examples in my book.You might want to refine the regex to match only occurrences of "of the" (and "of") near each other, such as\bof the(\s*\w+){0,2}\s+of the\bthat finds "of the" separated by less than 3 "words". The bounds could be adapted to be less restrictive, but as Paul has often said consider "economy of accuracy" - how much effort do you want spend on refining your regex and how many false positives/negatives are you prepared to accept.
Paul Filkin
Thanks for this - I put it through the display filter and it worked nicely for positions and titles.
I'll also try using it as a QA check in the target for another file that contains more running text.
Anthony Rudd
Thank you for the addition to the regex to help with proximity - I tried a few values from less than 3 words to 6 words, which caught everything that I wanted to catch - and meant that I was able to get through the list!
True, it was a typo - it should have been
\bof the(\s+\w+){0,2}\s+of the\b
But as I mentioned, the bounds may need to be customised