Suggestions for a Regex QA check to avoid "double genitives" (particularly of ... of ...)

I'm currently retranslating a text that a non-translator colleague had partially translated, and I noticed an abundance of double genitives in both source text (DE) and target text (EN) (rendering of German "des/der ... des/der ..." formulations with "of ... of ...") and in wanting to understand more about Regex was wondering if there might be a quick and dirty Regex-based QA check that could be done to trap these kinds of double genitives.

Does anyone have any suggestions on this?

Parents
  • Can you provide a couple of sample source and target sentences you need to catch?

  • It seems to be particularly prominent in titles and positions (e.g. in tabular reports of names and positions)

    Example 1:

    • DE: Auf Anregung der Vorsitzenden des Aufsichtsrats wurde beschlossen:
    • EN: With regard to the proposal of the Chairwoman of the Supervisory Board, it was decided:

    Example 2:

    Similarly I have a very lengthy list of titles, where some have been been translated as A of B of C (C is usually a company name)  

    • DE: Vorsitzender des Leitungsorgans der ABC AG
    • EN: Chairperson of the management body of ABC AG

    Sometimes it starts getting silly.

    • DE: Vorsitzender des Nominierungsausschusses des Aufsichtsrats des Tochterunternehmens der ABC AG
    • EN: Chairman/chairperson of the nominations committee of the supervisory board of the subsidiary of ABC AG
  • At a very simplistic level perhaps something like this would suffice?

    \bof\b.*(?=(?:\bof\b))

    You could use this as a QA check in the target for example, or even in the display filter to filter all the segments that do this.

  • Your question is interesting because it is also one of the examples in my book.
    You might want to refine the regex to match only occurrences of "of the" (and "of") near each other, such as
    \bof the(\s*\w+){0,2}\s+of the\b
    that finds "of the" separated by less than 3 "words". The bounds could be adapted to be less restrictive, but as Paul has often said consider "economy of accuracy" - how much effort do you want spend on refining your regex and how many false positives/negatives are you prepared to accept.

Reply
  • Your question is interesting because it is also one of the examples in my book.
    You might want to refine the regex to match only occurrences of "of the" (and "of") near each other, such as
    \bof the(\s*\w+){0,2}\s+of the\b
    that finds "of the" separated by less than 3 "words". The bounds could be adapted to be less restrictive, but as Paul has often said consider "economy of accuracy" - how much effort do you want spend on refining your regex and how many false positives/negatives are you prepared to accept.

Children