Deletion of multiple entries

In importing my MT 5.1 files into new MT Studio databases somehow quite a number of entries were duplicated. Sometimes they even appear 12 times in my database. Since my MT database is rather extensive, trying to fix this manually would take days.

In MT 5.1 it was very simple: just synchronize on index field, and double entries were combined.
Why doesn't this work in MT 2014?

Parents
  • Hi Ineke,

    It should work in MT 2014 as well.  Maybe try reorganizing your termbase first?  Perhaps something was lost during the upgrade?

    Another way that might be interesting for you is to export to Excel using the Glossary Converter and then use excel to easily remove duplicates and convert back again afterwards.  How easy this will be depends mostly on the complexity of your termbase but the Glossary Converter is an excellent tool that everyone should have in their armoury:

  • Paul, thank you for your reply. My TM termbase is very extensive, so extensive that I am using 3 of them now, because someone told me they do not work properly anymore when they are bigger than 200 mb. And all combined, mine are definitely larger.

    I tried reorganising, exporting and importing them in a new termbase, and the only effect it had was multiplying even more entries, so it doesn't work as easily as in MT 5.1.
    The best way would be to have an export file that could be imported in MT 5.1, which worked very fine for me, then create a new export and import it in MT 2014 again.

    Cleaning up the file in Excel is not an option, it is simply too big and too time consuming.
    I tried the glossary converter, but it did not make me happy. I spend a few hours tossing around with it, but it didn't do what I wanted, so I gave up again.
  • Hi Paul, of the files I already sent you, the 03 file does not contain many entries. I tried it with that one, and the problem was immediately visible. I exported the 03 file to a tmx file, created a new TM on basis of the 01 file, and then imported the 03 export file in it. This was done really quickly, and when browsing the termbase, I immediately noticed the duplicate entries.
    Since I was sure there were no duplicate entries in the 03 file (since I had not imported anything into it, I had only filled it with new terms while working), I had a look at such a duplicated entry and noticed it contained two synonums.
    Other entries were duplicated more than once, while I was also sure I only imported the file once.
    So I imagined it might have to do with synonyms.
    When I checked this, I noticed the number of duplicate entries was similar to the number of synonyms in the source.

    Hence my conclusion: Multiterm 2014 treats each synonym as a new entry, without giving me the option not to do so.

    Does this help enough?
    And if not, I guess it would be very easy to replicate this yourself.
    I use SDL Trados 2014.

    This is a big hickup in my view, which has already cost me far too much time.

  • Hi Ineke,

    Probably it's me being dumb but I'm struggling to follow you a bit. Presumably you meant TBX and not TMX, and SDLTB instead of TM?

    Can you tell me a few of the terms that are being duplicated and I can focus on these?

    Thank you

    Paul
  • Sorry, all those abbreviations... and I am working against the clock, so I do not have much time right now.
    I uploaded a few new files, and left the original databases in dropbox too.
    I have uploaded the export of the 03 file with the filter 'contains synonyms' and the complete export (03a).
    I have uploaded the 2016 database, with the imported 03 files, so only the synonym containing entries are in there.

    Hope this helps.
    I do not have any more time today, so if you have further questions, they will have to wait until tomorrow or the day after, I have to concentrate on my deadline now.
  • Hi Ineke,

    I think, when you have more time, we need to go through this together. I have looked at your list of duplicates and I don't have any of these duplicated in the file I sent you. I also tried to replicate what you did and when I imported the same file over the top of the termbase synchronising on English I still don't get any duplicates.

    This is a tricky one, but such an important feature that I don't think there is a glaring problem with the software because we would certainly know about it already.

    Let me know when you have more time and we can work through this together so that I can see exactly what the problem is and where it may be caused.

    Regards

    Paul
  • When importing, I do not get the option to synchronise on English. Where can I find it?
    I did export the file on the English field.

  • Most likely you are not creating your own import model, rather you are relying on the defaults. Not a good idea.
  • When creating an import model from scratch, I still do not get the option to synchronize on index field, which was the setting that did the trick excellently in MT 5.1.
    So if this option is available while importing, please tell me, where is it hidden?
  • Does this help?

  • Thank you for creating this for me, Paul. I couldn't help myself, and just tried it.
    For starters, it worked with the termbase I created for this.
    But... the new definition is only available in that termbase, not for all termbases.
    So when creating a new termbase, I have to create a new import model again. Copying it to another termbase did not seem possible, is that correct?

    And next to that, in my view, it is rather strange that one has to do a simple task this way, to have the option: import on index field and choose the index field, as was possible in MT 5.1
    This option should automatically be possible for every import action.
    When I see Import/Export in the main menu, then why should I think of going to termbase management to make an import model?
    The option should be available in the import/export menu, and it should be possible to create an import model that can be picked for every database, not only for the current one.

    I am currently importing my other databases too, and I will let you know if this option of creating a new import model for a new termbase did the trick. I do hope so.
  • Ineke Kuiper said:
    For starters, it worked with the termbase I created for this.

    Good start ;-)

    Ineke Kuiper said:
    So when creating a new termbase, I have to create a new import model again. Copying it to another termbase did not seem possible, is that correct?

    No that's not correct.  You can save the definition to a file and then load it into any termbase you like:

    Ineke Kuiper said:
    This option should automatically be possible for every import action.

    I guess... but the idea here is that users can create different import export models for themselves and depending on who they are sharing it with they just select the appropriate one.  This is far better than having to work through the options every time they do it.  As a translator you won't do this very often and I can see your view.  As a terminologist maintaining a termbase for many uses you wouldntt be very happy with that.

    I can't comment on 5.1... can't even remember what it looked like!!

    Regards

    Paul

  • This is what 5.1 looked like, when I chose Import. Then immediately I got all the options available.
    I understand that from the point of view of database maintenance it is nice to have a lot of programmable functions, but for freelance translators who work with the program every day, in my view it is a nuisance that simple options like this one that were instantly available are now hidden, and demand an extensive study of the program to be understood.

    It is one of the reasons why I started to really hate upgrading any program: always there will be things I liked that have disappeared or hidden, despite any new features which I might or might not have missed in the past.

    For instance, on a new laptop I installed W10, and for the live of me, I cannot find the simple command box in which I can type: msconfig
    Why do program makers have to change the way a program looks so often?
    I do not like the ribbon in Word or Trados either: it only means more work: things I could do with one mouse click now often need two mouse clicks. Why?
    It worked fine, and except for the fact that it might look prettier, it did not improve the functionality. On the contrary. I used to be able to create my own customized tool bars, and now I can only use one. I used to be able to create icons for functions on these tool bars, now I cannot do that anymore.
    So in my view, these days upgrades too often feel like downgrades.

    By now, the import of the three files in the new file with the new import definition which makes the import synchronize on the index field, has finished. And all duplicates are gone.
    From the log file of the import of the worst file (02):

    Total entries processed: 37846
    Total entries added: 5205
    Total entries merged: 32641
    Total entries omitted: 0
    Total entries written to the output file: 0

    So no need to use the Converter or Excel, it can simply be done in Multiterm itself, if you know and can find how.

    Thank you, Paul, for explaining this, now my problem is really solved, with the instructrions you send in the youtube video.
    I safed the import definition, since I expect I will want to use it more often.
    Thank you for your patience, too.

  • Thanks Ineke,

    I actually prefer these single screens rather than the wizards. I prefer to be able to see everything in one page, but I think today wizards seem to be thought of as easier for most users. I guess it'll all be easier now you know though and in future you can just run the process and that's it... no need to select any options.

    On Excel and the converter... I agree, but if you want to get rid of fields and do more maintenance than this then I would take the Excel route every time.

    Really glad you're sorted out anyway!

    Regards

    Paul
  • For months, my new MT database worked fine. I added terms, corrected terms, all double entries were gone. Excellent!

    But now, all of a sudden the database has somehow restored all the double entries or created a lot of new ones, without any interaction on my side.
    Next to that, the terms of this termbase are no longer recognized by Trados.
    When I use another termbase, built from scratch, the terms are shown. But just not the terms of my main termbase.
    I already exported the termbase, created a new one, used an import model which would synchronize on either source or English (both are okay in my case), but they are simply not imported.
    I already tried the solutions mentioned here: producthelp.sdl.com/.../4451.html
    but they do not fix the problem.

    So now I cannot use my main termbase interactively, which is a real nuissance.
    Any suggestions?
  • Hi Ineke,

    As you have a fairly simple structure why don't you export to Excel, get rid of the duplicates in there and then just recreate the termbase instead of trying to import with the import models?

  • Since that would be at least a day's work, since my termbase is not that simple. It contains a number of fields that are not sorted well when exporting, so I would have to rearrange them all manually.
    And since there are some 19000 terms in my termbase...

    For instance, strangely enough, in the export, source and target are not always correct, sometimes the source is NL, sometimes EN, all in the same column. So I would have to go manually through every line.

    I do not understand why the termbase you made for me, that worked well, suddenly isnt't working anymore.
Reply
  • Since that would be at least a day's work, since my termbase is not that simple. It contains a number of fields that are not sorted well when exporting, so I would have to rearrange them all manually.
    And since there are some 19000 terms in my termbase...

    For instance, strangely enough, in the export, source and target are not always correct, sometimes the source is NL, sometimes EN, all in the same column. So I would have to go manually through every line.

    I do not understand why the termbase you made for me, that worked well, suddenly isnt't working anymore.
Children