Under Community Review

Refine the alphanumeric matching and suggestions

As I mentioned yesterday in the community forum, Trados Studio is currently VERY poor at analyzing matches between alphanumeric elements. It actually seems to consider ANY combination of letter(s) and number(s) as equivalent to ANY other.

For instance, I had Qs&As to translate, with countless cases of A1., A2., A3., etc., to be translated to R1., R2., R3., etc.

In the applicable translation memory, I already had the following matches, which were the ones required in context (for English to French translation):

• Q1. => Q1.
• A1. => R1.

Nevertheless, here is what Trados suggested me as “matches”:

Worse than that, even though the R1., R2., R3... all the way to R14. were already confirmed and in the memory, EVERY TIME I confirmed a Q1., Q2. or else, ALL my confirmed R's reverted back to A's.

• First, I don't understand why “R1.” would not even appear in the match list, while it was ALREADY in the memory AND 100% equivalent.

• Second, “Q1.” should definitely not be shown as a 100% equivalent, as reflected by the fact that the suggested translation “match” is not the right one.

• Third, but not least,  the ONLY logic behind considering “T-2161-15” and “(7th) 98” as matches for “Q1.” or “A1.” seems to be as dumb as “they all are alphanumeric is some way”. Not very promising with regards to possible results... 

As a test, I deleted the confirmed “Q1. => Q1.” segment from the memory, and ONLY THEN did the existing “A1. => R1.” segment appear is the matches. Then I re-added the  “Q1. => Q1.” segment, and the recognition worked properly. But still, the problem should never have occurred in the first place... and Trados should definitely NOT have considered “(7th) 98” as a match for “A1.” considering that its suggested translation ended up as “(A1) 98.” instead of “R1.” 

  • UPDATE : and , here is a screenshot and additional insights (below) regarding flaws in alphanumeric correspondence analysis:

    In that case, I don't understand why the first corresponding entry for “Sections 7-11” is the one which originally was “Section 8(2) => Par. 8(2)” – marked as a 94% match, and erroneously suggesting “Par. 7(11)”.

    It would have been more logic to match it with the existing entry “Sections 5 and 6 => Articles 5 et 6”, marked as a 76% match. Although it remains imperfect (suggesting “Articles 7 et 11”, while it should be “Articles 7-11” or “Articles 7 à11”), it is technically much closer than the supposedly 94% match.

    As for suggestions 3 and 4, they improperly consider phone numbers (respectively “613 555-7890” and “+ 613 555-4567”) as fuzzy matches for “Sections 7-11”, which is nowhere near an actual match.

    Aside from the latter inconsistency, there should definitely be a better consideration of typographic characters (hyphens, parentheses, brackets, spaces, etc.) AND letters or specific words (“and”, “or”, etc.) in alphanumeric segment matching. That in itself would most likey increase relevance in results. It would clearly not be sufficient to correct all alphanumeric matching inconsistencies, but it might still be a good start.

  • Update: Although I said the the first proposition in the table above “should indeed stay at 100%”, even that is open to debate, since the “1” was switched to “2”. In thw case of Qs&As, that is perfectly fine and even desirable, but I guess in other cases the automatic substitution by increment would not necessarily fit as a 100% match... so maybe the numerical incrementation should also be included as a option that could be switched on of off, depending on context. 

  • P.S. In addition to what I said last week about that issue, there is one important thing that should be updated in the way alphanumeric recognition and matching percentage work in Trados Studio, and : whenever the proposed translation results from an automatic substitution, it should always come with a penalty (or at least, we should be offered to add a penalty in the options).

    For instance, in the case below, as I mentioned, the correspondences I have in my memory are respectively “A1. => R1.” and “Q1. => Q1.”, so the first one should indeed stay at 100%, while the second one should have a penalty and be marked at 99% or less. As for the last one, I don't think it should be considered as a match at all, not even a fuzzy one. And it should especially NOT be the case for such things as “(7th) 98” (which now does not appear anymore in my results, since I deleted from my TM altogether).

    As an illustration, based on that flawed logic, Trados could aguably consider a Mazda3 as a close equivalent to an Audi Q100 or a Porsche 911. As you see, that definitely doesn't reflect reality... 

  • I can only agree that this topic needs to be looked at very closely. We also have a lot of texts like this and also struggle with the way that Studio handles them. As you say below, the fact that disabling the alphanumeric recognition prevents setting such things as 100% matches is also unfortunate.

  • That's a correct workaround to avoid improper propagation (plus, as Daniel Hug suggested, I made sure to check “Matching segment has been translated differently” (see screenshot below). Still, I would like my Qs&As to automatically set both “Q[number]. => Q[number].” AND “A[number]. => R[number].” as 100% matches, to avoid changing them individually, and disabling the alphanumeric recognition inconveniently prevents that at the same time.

    More importantly, as I mentioned, the fact that Trados would even consider “(7th) 98” as a close to “A1.” (and suggesting “(A1) 98.” as a possible translation) indicates a flawed analysis algorithm that would definitely need to be refined.