Detecting language please wait for.......
Hello - we have several thousand tmx files to import, but the Studio TM doesn't display the tmx file name, whether it be the actual name of the file, or a property within the tmx file. Is there a way for this info to be kept and displayed? Many thanks!
Just to specify, as can be inferred from the number of files, we're particularly looking for a way to batch-import!Lucy-Jane
Lucy-Jane Michel said:Question regarding the import of tmx files into Studio TMs
Yes. You need to create a field on your TM to write in the name of the TMX and then when you import use the filename.
However, if you want this to happen automatically because you have thousands of files then I'm afraid there isn't a way out of the box the do this. It could be done through the API... probably not too difficult. We might have a look at this through the appstore if nobody else does.
A possible workaround for now, I guess, would be to edit the "Created by" values in the TMX files with a batch process (this would be outside of Studio) and then the system fields could be updated with the value you use (the filename).
Another possible workaround woud be use Powershell... Evzen Polenka might have covered something like this with his powershell tools here:
Unless of course someone has a better idea?
First, open any two tmxs with any text editor and take a look at them carefully with timeyou will know how to handle it properly
Then add up all (several thousands ?) of them to make a single tmx
Finally, import it.
Then add up all (several thousands ?) of them to make a single tmx
Finally, import it.
Except it doesn't solve the problem at all. They want to stamp each TU that is imported with the filename as a TM field.
Importing multiple TMs isn't a problem and there is absolutely no need to merge them in the first place. This is about identifying where the material was imported from in the final TM.
I'm afraid that STraSAK doesn't have a feature which could be used for this task.
But the whole task sounds like an XY problem...It seems to me that the real starting point for the task is somewhere before the thousands of TMX files... simply because I can't think of a "sensible source" of thousands of TMX files...Like, why and from which source would someone create so many TMXs, that doesn't make much sense to me... it sounds like if someone thought that the only way to create bilingual content from thousands of bilingual files is to export each file to a separate TMX... or something like that.Which would mean that the ultimate goal could be probably achieved differently, not necessarily the worst possible way (via the separate TMXs).
So, Lucy-Jane Michel, if you don't mind, can you provide a bit more context?
It looks like it could be solved programmatically using EditScript functionality during import to TM, but EditScript is NOT PROPERLY DOCUMENTED ANYWHERE :( :( :(
There is a mention about it in the Studio 2015 (!!!) TM API documentation (http://producthelp.sdl.com/SDK/TranslationMemoryApi/4.0/html/024ab948-758e-4f14-a7c5-e7e8a058b433.htm - check the bottom right of the API schema) and one can find the EditScript class itself documented, but there is not a single word anywhere about how to use it - what the script actually is, how it works, etc.
What's more, this part is completely missing from newer API version documentation, there is a LOT missing in the newer version documentation!
Is SDL ever going to fix this?Paul, the other day you were wondering why there is so few developers doing something with Studio API for the community... perhaps this is one of the reasons - with such a poor support from SDL, why bother...
First of all, my apologies for not responding sooner, and my thanks for all your replies and suggested avenues for exploration.
A little context:
At the OECD, we've been using Multitrans for the last decade, complemented by Deja Vu, which arrived about 5 years ago. The only way to share resources between the two tools is by .tmx. Our Multitrans corpora contain many thousands of file pairs, and so we've been aligning these using Align Factory for use with Deja Vu, or for our external translators to use with whatever CAT tool they have - which is how we have ended up here!We're switching our two tools for Studio in the near future, and of course need to transfer all our current resources. Some of the TMs are going to require importing several hundred tmx files, which is why we were hoping to be able to batch-import, preferably using the filename itself, to avoid a) modifying the metadata of each .tmx file, or b) entering the metadata for each .tmx individually at the moment of import. I hope this sheds some light, although it is starting to look like we're no going to be able avoid using the metadata fields, and modifying the metadata for each one....any other ideas most gratefully received!
Many thanks again,
Question is, WHY do you need/want to have each translation unit marked with the filename it comes from... because such thing does NOT happen during the standard translation workflow anyway.When importing bilingual content at the end of translation workflow into some mater TM you can mark the imported translation units by updating a user field in the TM, but this happens for the entire imported batch, not for individual files... unless the size of the batch is a single file, of course.
So, are you saying that you did not build any TM during the past decade, so the only thing you have now are the TMXs for individual aligned file pairs? That sounds weird...
In any case, modifying the metadata for each individual TMX can be of course done programmatically, so you don't need to do it manually.But you need someone with some programming/scripting skills to create such tool for you... Or you can check if some of the tools from OKAPI Framework has such capability.
I could add such functionality to STraSAK, but as mentioned above, there is a fundamental SDL documentation missing, so until SDL fills the missing pieces of information, no progress in this are.
Weird as it may sound, that's how it is! Multitrans is basically an indexing tool, and so is document-based, not segment-based; it compiles static document repositories (corpora), rather than TMs. It is possible to export Multitrans content to tmx format, but unfortunately has too many bugs for us to use, and the required metadata isn't exported. The nature of translation at the OECD means it is vital for translators to know exactly where each segment is coming from.
Yes, writing a script for the modification of the metadata is pretty much the conclusion we've reached as well...we were very surprised that the file name isn't retained by Studio, given that it is retained when importing to DVX!
I don't know what your timescales are, but we are going to start work on a small import plugin for Studio shortly. This will support the import of SDLXLIFF files first, but we will then add TMX support. The idea being it will optionally add the following metadata to fields in the TM:
- TU number from the SDLXLIFF
This may help you if you are unable to get a sensible approach anywhere else in the meantime.
Thank you Paul - that is very good news!
Our timescale is pretty short, as roll-out of Studio is planned for February/March, so we'll need to find a solution soon - at least for our key resources to begin with - but we're getting there. The import plugin with these options certainly be extremely useful for the future. I can hear the sighs of relief already!
That's great... we'll try to do this as quickly as we can. You might also be interested to know that we are also nearing completion of an updated vrsion of the Multitrans plugin for Studio that is here:
The current version has some limitations in that you can only use resources attached to the project you select. The updated version will allow you to select any resources from a Multitrans instance and use them in your project. We are also providing separate plugins for just MultiTrans Termbases and MultiTrans TMs so that it would be possible to use them in a standard Studio project for example.
If you'd like to test them when we release to Beta let me know? We're pretty close.
Thanks very much Paul. I believe the plug-ins are for Multitrans Prism? We're on Multitrans 6, not Prism, and phasing it out over 2020. If the plug-ins are usable with Multitrans 6, I'd be delighted to test them, if my manager gives the green light. Quick question: how can I ensure we're informed as soon as the import plugin is available? Is there a notification system to sign up to, or something similar?
Lucy-Jane Michel said:If the plug-ins are usable with Multitrans 6
No, they've been buit for the latest version of MultiTrans.
Lucy-Jane Michel said:Quick question: how can I ensure we're informed as soon as the import plugin is available? Is there a notification system to sign up to, or something similar?
For new apps, your best bet is to either sign up to the RSS feed, or follow #sdlappstore on twitter. For existing apps we have a plugin which will help with update notifications as it will also download and install the updated versions. This article might be interesting:
And the notifications plugin is here:
Fantastic, signing up right now. Thank you, Paul!