Under Community Review

Improving term recognition to find terms separated by a comma without a space

Authors sometimes forget to use a space after a comma in sentences. If this is the case, term recognition will not recognize the terms before and after the comma.

Here´s an example:

The source sentence reads "Dies ist ein Beispielsatz mit Menüeintrag,Zeichnungseintrag."

The bold terms have individual entries in our termbase, but are not recognized by term recognition due to the missing space after the comma.

It would be great if term recognition could be improved to also recognize the terms in this particular situation.

 

I am aware that the source text should have been correct in the first place, but the reality is that we are often dealing with sub-par source texts that we cannot review and correct before the translation.

  • Hi Paul,

    in theory, this would be an option. But given the huge number of documents that we have to handle, running each of them through another tool before starting the translation process would simply be too much effort.

    I think we would all like to have perfect source texts, but the reality is a different one and I think that CAT tools should - in one way or the other - cater to that reality.

  • Not sure whether to vote for this or not as it really is a source problem as you say.  Maybe you could use something like the sdlxliff toolkit and search across the whole project source in one go.  Then replace all instances of a comma preceded and followed by letters without a space.