File splitting - SDL XLIFF split/merge improvement/change request

Hi, all - I'm sorry if this is not the right place to put this, there doesn't seem to be a specific area to post Open Exchange app queries.


I at times need to split files out for large projects across many translators - as all PMs must do, but that is not my usual pass-time. There may be a simple solution to this that I am just not aware of, all lessons gratefully received. My present solution is manually and tediously working out the splits in Word, creating separate Word docs (which messes up the numbering, oh oh oh). Could provide the full file to all and tell them what their page numbers are, but I have found that cause issues in the past. Colour-coding is also an option - i.e. highlight Fred's word count in green, Jenny's in blue etc. Still tedious.


Meanwhile, I have investigated XLIFF split/merge, and it really doesn't do what I'm hoping for.


Scenario: 210K words (approx.) across 2 files, to be split across multiple translators, each of whom can do a different number of words. The 2 files need to be in the same project, there is cross-file repetition etc.

A couple of issues with this app:

1) the generated split-file names - really, really need the file names to be at least in part humanly recognisable! Conversations with translators are impossible - 'my file 001_79-37-D7-94-3E-49-F7-F7-6B-A6-C4-E4-69-40-9E-E7-DE-80-CB-C3' - yeah right. And imagine you are using this for multiple projects - you have to be able to know what file people are talking about! Suggest being able to give x chars of 'my' file name (at least 15), and you could incorporate into your automated file name. e.g. 001.myname_info_your-number-stuff. Sound possible? Or be able to update the file names and your rebuild register with updated names - open to so much human error, support nightmare, but option?

2) When splitting, would really need to be able to define varied word-count splits, totalling my file word count total (all are approx/rounded of course) e.g. 15000, 80000, 20000, 40000, 30000, 11000, 18000, 5000, balance. An alternative would be a 2-stage process whereby I could create a set of e.g. 5K splits then re-join, but obviously your split/merge process couldn't deal with that at present. I really don't want to send someone, e.g. 16 5K files (with incomprehensible names...) for their 80K word quota! And if I did - as a translator, I'm sure I'd merge them into one in Trados, which of course can be done, and what then happens when they send them back to me? Plus, in my scenario, if I want to create a project that contains all the translator's sdlxliffs, I can no longer tell which of the two source files the individual generated .sdlxliffs comes from. Am I missing something?

3) splitting by segment number is only any use if I have already calculated (in Word) where I want the splits to be, then created the project, then find the corresponding segment numbers in the project, note them... there is no gain.

I did try it with a team last year, and we got nowhere, I think we had a lot of issues with file names being incorrectly saved etc. Chaos ensued. But it still seems like a good idea.

Thoughts, suggestions and completely different solutions gratefully received. Thanks!

  • Hi Sian,

    We do have a little project waiting to start where we intent to take the SDLXLIFF Toolkit, the Split and Merge App, and merge them into one application, and at the same time make it more integrated with Studio so it's a View instead of a separate application.  We also have a number of enhancement requests and a few bugs to iron out... so we plan to tackle them all.  We'll take note of your comments.

    I'm not sure about the filenaming though as it could get tricky if you split into 500 files for example as the original file is massive.  How would you manage that naming yourself in a meaningful way?  Happy to consider any ideas for that.

    I'm also not intending to implement a file merging as I struggle with this for a couple of reasons.

    1. It is problematic.

    2. I'd rather use the files that are returned to update the original project instead.

    What do you think of that?

    Regards

    Paul

    ps. I moved your post into the one you could not find ;-)  Click on Forums when you are in the TP Community and you see a list of them all, or scroll down in the home page for the TP Community and you see a list of them all... or maybe use this link : http://community.sdl.com/appsupport

  • Regarding your second issue, I don't find sending multiple files to translators that big of an issue anymore thanks to virtual merge in Studio 2014 and 2015. It allows the translator to open the files as one but doesn't create a new file - all the work is still done on the individual files so you will get back the same files you sent. Obviously this doesn't help with the names... I usually refer to files by just using the "number" of the file (001, 002 etc.)
  • Hi, Paul, thanks for coming back to me.

    On the file naming front, I was rather hoping it would be possible to prompt the user for x characters to be added as a parameter within the file-naming process - so that, current file-naming process picks up and integrates the chars given for a format something like (just a suggestion). My alternative solution-seeking brain (instead of being a good little analyst and saying what not how) was thinking that the generated file names (numbers) could be maintained somehow internally (like a checksum?) and in a register, but I can foresee all sorts of issues with that, not least how Trados would handle it.

    001_[my_file_info_up_to_nnn_chars]_all_the_rest_of_the_calculated_name_as_now. This would make for VERY long file names... I don't know if there are limits anywhere that would make this dangerous. Particularly for those still using earlier versions of Windows, Word, Trados...

    I do understand not wanting to implement the file merge. If the file splitting were more versatile, it would not be necessary anyway, would it. The View idea could be great - well, I'm thinking all sorts of things like a 'go to word nnn' feature to help me with splitting visually, if that make sense.

    I'm also grateful for Jesse's response, which I am about to respond to too.

    Thanks! Sian
  • Hi, Jesse, thanks for replying. Yes, I noted in my question that of course translators can merge the files in Trados, but I hadn't cottoned on that it is a view, physical file names are retained. That's a good point. However, there are still quite a lot of people out there who are on 2007 and 2009. A pain, but true. So that limits the options.

    Yes, the file number is the logical thing to refer to, but as I mentioned, in my scenario I have 2 source files that need to be in the same project, which split out as 001_numbernumber and 001_numbernumber. That is where I'm wondering if I'm missing something, surely there are often projects with more than one file in that need to be split (and multiple projects running at once with splits) - how to differentiate between 001s and 001s?

    Cheers, Sian
  • Yeah two files make it indeed more complicated, I don't have much experience of this sort of situation (I guess I'm lucky!). Maybe using two different projects would be the best option - if you're splitting files, you miss the cross file reps anyway. But I agree, it will be confusing no matter how you do it!
  • Hi Sian, it's not ideal but perhaps you can try adding the split files by folders. When SDL XLIFF Split and Merge splits .sdlxliff files, it would generate one folder per file and put the split files in them. That per-file folder name is <original_file_name>.splits so should be more humanly recognizable. So if we add those folders into a project, at least in the Files view the 001-of-file1 will be under file1 folder and 001-of-file2 will be under file2 folder, and so on.
  • Hi Wen-I - I'm sorry I only just saw this. Just to make sure I understood - so, if I have my 2 source files in the same folder, and choose to add by folder, not by individual file, then SDL XLIFF Split and Merge will generate one folder per file with a humanly recognisable name - ok, yes, that is good, got that.

    It's a little better for me personally, yes, thank you - but not for my recipients/communication, I fear. As is common in the agency/freelancer world, there are many different tools and providers per project, indeed, per file. I still end up with multiple split files going to different places and people, and they still end up with non-human names.

    With Trados users, I could simplify life by using packages. Still need to learn about them.

    Hoping some sort of solution will come along in the future. Alternatively, I'll just split the Word files, I guess...
  • Hi Paul,
    Are there any news on the project you are mentioning here? I am really interested in an app to replace the Split and Merge app that would work with 2017.

    Thank you,
    Sarah
  • Hi ,

    I'm afraid not. It's still sitting there in our backlog so we don't have a replacement yet. We also have an item in the backlog to look at the existing split and merge app to see whether we can do anything in the short term, but we didn't reach that yet. I also don't have any timescales for this either so can't be a lot of help I'm afraid!
  • Hi Paul, as we personally never had any issues when using the old Split & Merge app, we would even appreciate just being able to use that old version 'as is' in Studio 2017, without any changes or enhancements... Best regards, Lieven
  • Hi , that's effectively what the second backlog item is in my list.
  • Hi Paul, any chance the code for the SDL Split/Merge app could get posted to the Sdl-Community Github? We also need to split files quite often, unfortunately.

    Lennart

  • We would be very interested in this new split/merge solution, and any news on it (when you have it). The requirement for batching for content for translation has become increasingly common for us, and with the previous split and merge tool now not functional, we're running into regular logistical issues because we can't split up our projects easily. The workaround of using the Toolkit or other manual processes isn't something we'd like to keep doing in the long-term.