Size limitations when working with large Translation Memories?

Is there a limitation on or recommendation about the size of your TM? I have not been able to use a 14GB TM with a 2.5 million TUs because it will cause Studio to crash. Yes, I waited several hours and retried several times until it was upgraded. No, it didn't happen before Studio 2019 (although the TM was smaller back then).

Parents
  • There isn't strictly a limit, but with the more recent versions of Studio we are also extracting fragments so the size of the tM and the work it has to do can definitely impact the ability to handle larget TMs.  I played around with a fairly large TM containing 5 millions TUs, so double the size of yours.  I was able to convert this, even building a translation model for UpLift:

    Certainly took a while, but it was possible.  However, it's also worth noting that the actual content of the TM could also affect this.   In this case you can see many of the duplicates, or or unnecessary TUs were removed so I ended up with 3.4 million TUs.  If the content was heavily tagged, long sentences, few recognised tokens etc. then it could have been larger and may have even failed.

    So there is no black and white answer.  But in general, we are talking about a desktop tool here that is deigned to extract as much use as it possibly can from the TM, and for this it takes a lot of resource.  In practice I think that once you start getting to around a million TUs you can expect performance issues.  At least that's the sort of experience I see.

    We are also always working on improving this, so we may be able to handle larger and more complex data in the future. Until then it may be wise to split a large TM up if you don't get the results you are looking for.  Being able to work with multiple TMs is helpful in this way.

Reply
  • There isn't strictly a limit, but with the more recent versions of Studio we are also extracting fragments so the size of the tM and the work it has to do can definitely impact the ability to handle larget TMs.  I played around with a fairly large TM containing 5 millions TUs, so double the size of yours.  I was able to convert this, even building a translation model for UpLift:

    Certainly took a while, but it was possible.  However, it's also worth noting that the actual content of the TM could also affect this.   In this case you can see many of the duplicates, or or unnecessary TUs were removed so I ended up with 3.4 million TUs.  If the content was heavily tagged, long sentences, few recognised tokens etc. then it could have been larger and may have even failed.

    So there is no black and white answer.  But in general, we are talking about a desktop tool here that is deigned to extract as much use as it possibly can from the TM, and for this it takes a lot of resource.  In practice I think that once you start getting to around a million TUs you can expect performance issues.  At least that's the sort of experience I see.

    We are also always working on improving this, so we may be able to handle larger and more complex data in the future. Until then it may be wise to split a large TM up if you don't get the results you are looking for.  Being able to work with multiple TMs is helpful in this way.

Children
No Data