Deletion of multiple entries

In importing my MT 5.1 files into new MT Studio databases somehow quite a number of entries were duplicated. Sometimes they even appear 12 times in my database. Since my MT database is rather extensive, trying to fix this manually would take days.

In MT 5.1 it was very simple: just synchronize on index field, and double entries were combined.
Why doesn't this work in MT 2014?

  • Hi Ineke,

    It should work in MT 2014 as well.  Maybe try reorganizing your termbase first?  Perhaps something was lost during the upgrade?

    Another way that might be interesting for you is to export to Excel using the Glossary Converter and then use excel to easily remove duplicates and convert back again afterwards.  How easy this will be depends mostly on the complexity of your termbase but the Glossary Converter is an excellent tool that everyone should have in their armoury:

  • Paul, thank you for your reply. My TM termbase is very extensive, so extensive that I am using 3 of them now, because someone told me they do not work properly anymore when they are bigger than 200 mb. And all combined, mine are definitely larger.

    I tried reorganising, exporting and importing them in a new termbase, and the only effect it had was multiplying even more entries, so it doesn't work as easily as in MT 5.1.
    The best way would be to have an export file that could be imported in MT 5.1, which worked very fine for me, then create a new export and import it in MT 2014 again.

    Cleaning up the file in Excel is not an option, it is simply too big and too time consuming.
    I tried the glossary converter, but it did not make me happy. I spend a few hours tossing around with it, but it didn't do what I wanted, so I gave up again.
  • Hi Ineke,

    The practical limit of a file-based MultiTerm termbase is around 2Gb and this is because it's basically an Access database. So 200Mb should not be a problem at all.

    If you are able to share one of your termbases I'd be happy to take a look at it?  If you can do this please drop me a link to pfilkin@sdl.com You can use dropbox or wesendit perhaps?

    Regards

    Paul

  • 2 GB, okay, that is better.
    I have 3 databases now, totalling about 600 MB, so in your view I should be able to combine them in one, and still be able to add to them. That would be great.

    I have created a folder in dropbox and have sent you an invitation. The files are being uploaded right now.

    Thanks!
  • Paul, thank you so much for your help, by creating a new database with all the entries in it, without the duplicates. This is very helpful for me and will safe me time, since I was not adjusting the files one entry at a time, every time I saw one.
    Now I will not have to spend anymore time on this, and now I can use only one database, instead of more than one.

    I am really grateful, thank you so much!
  • Thanks Ineke,

    No problem, your files were pretty large and I did have to keep them split in order to import them as MultiTerm struggled with the reorganisation.  There is a KB somewhere related to that but importing as separate files worked fine.  I also double checked that the export was ok as this would now be one large XML file and using the Glossary Converter this worked easily in around 3 seconds!  So I'd recommend you do this regularly to keep a backup of your XDT and XML.

    The steps I used to resolve the problem you were having were as follows:

    1. Used the Glossary Converter to get your three files into Excel
    2. Tidied up the excel files (simple to do with filtering) and removed empty fields altogether
    3. Used the Glossary Converter to create the XTD and MultiTerm XML files for the three tidied up excel files
    4. Created a new MultiTerm Termbase with one of the XTD files
    5. Created a new import model based on merging on English to make sure I had what you wanted
    6. Imported all three MultiTerm XML files
    7. Import was fast, the reorganisation takes a little while especially for the biggest of the files

    Kind regards

    Paul

  • Paul, thank you for explaining and all your help.
    I will make a backup on a regular basis, as I always do.

    Thanks again!
  • I am sorry, I was too early in assuming all went well.
    Now it seems a number of entries are missing in the new file, that were present in the old files.
    So I started using my old files again.

    I tried to import the three databases myself into a new database, after performing an export, and now I discovered the problem of the multiplication: when the source contains multiple synonyms, which is often the case in my database, they are duplicated after an export and a new import. Seems to me this would be an error in MT. This problem was not present in MT 5.1. And I do assume MT will not forbid us to have entries with multiple synonyms?

    So I thought, perhaps I should try the filter: source contains synonyms, but then only the entries with multiple synonyms are imported, and the ones with only one source term are skipped. But here once again: the imported entries were multiplied as many times as there were synonyms in the source entry.

    So this will make it completely impossible to export a database that contains synonyms into a new database without multiplying the entries that contain synonyms, and who would want that?

    Is SDL aware of this problem?

  • Hi Ineke,

    Can you create an excel file (or termbase whichever is easier for you) with one or two entries in it and multiple synonyms to explain the problem and I'll take a look. If we can reproduce the problem with one or two entries it'll be much easier and faster to solve.

    Thanks

    Paul

  • Hi Paul, of the files I already sent you, the 03 file does not contain many entries. I tried it with that one, and the problem was immediately visible. I exported the 03 file to a tmx file, created a new TM on basis of the 01 file, and then imported the 03 export file in it. This was done really quickly, and when browsing the termbase, I immediately noticed the duplicate entries.
    Since I was sure there were no duplicate entries in the 03 file (since I had not imported anything into it, I had only filled it with new terms while working), I had a look at such a duplicated entry and noticed it contained two synonums.
    Other entries were duplicated more than once, while I was also sure I only imported the file once.
    So I imagined it might have to do with synonyms.
    When I checked this, I noticed the number of duplicate entries was similar to the number of synonyms in the source.

    Hence my conclusion: Multiterm 2014 treats each synonym as a new entry, without giving me the option not to do so.

    Does this help enough?
    And if not, I guess it would be very easy to replicate this yourself.
    I use SDL Trados 2014.

    This is a big hickup in my view, which has already cost me far too much time.

  • Hi Ineke,

    Probably it's me being dumb but I'm struggling to follow you a bit. Presumably you meant TBX and not TMX, and SDLTB instead of TM?

    Can you tell me a few of the terms that are being duplicated and I can focus on these?

    Thank you

    Paul
  • Sorry, all those abbreviations... and I am working against the clock, so I do not have much time right now.
    I uploaded a few new files, and left the original databases in dropbox too.
    I have uploaded the export of the 03 file with the filter 'contains synonyms' and the complete export (03a).
    I have uploaded the 2016 database, with the imported 03 files, so only the synonym containing entries are in there.

    Hope this helps.
    I do not have any more time today, so if you have further questions, they will have to wait until tomorrow or the day after, I have to concentrate on my deadline now.
  • Hi Ineke,

    I think, when you have more time, we need to go through this together. I have looked at your list of duplicates and I don't have any of these duplicated in the file I sent you. I also tried to replicate what you did and when I imported the same file over the top of the termbase synchronising on English I still don't get any duplicates.

    This is a tricky one, but such an important feature that I don't think there is a glaring problem with the software because we would certainly know about it already.

    Let me know when you have more time and we can work through this together so that I can see exactly what the problem is and where it may be caused.

    Regards

    Paul
  • When importing, I do not get the option to synchronise on English. Where can I find it?
    I did export the file on the English field.

  • Most likely you are not creating your own import model, rather you are relying on the defaults. Not a good idea.