Application to extract an analysis from a standalone SDLXLIFF

The scenario is this.  You are sent an sdlxliff to translate without a TM and it has already been pretranslated and payments terms stated on the basis of what's in the file.  How do you determine what this is in summary so you have the basis for questionning the payment if necessary?  I ask this because I had such a query this morning and couldn't solve it easily.

I could create a new TM from the sdlxliff and rerun the analysis from this, but the results are of course not correct because we don't have the basis for a proper comparison.

So I created a small spreadsheet on the basis of the report that comes from the SDLXLIFF Converter and then added a couple of columns to count the words and group the matches.  Clearly the wordcount will also be off and the analysis still doesn't match exactly, but it gave me an idea for another application.  So I'm adding this one too for comments.  A proper application that could use the proper analysis mechanisms would be quite cool I think.

I attached the spreadsheet too so you can see what I mean... it was actually useful for the user even with it's obvious flaws.

Regards

Paul

Analysis Spreadsheet.xlsx
  • Hi Paul,

    I guess "pretranslated" means, that the SDL-XLIFF file you got has all 100% and CM pretranslated? Or are ther also Fuzzy matches pretranslated?

    First of all: If we get a pretranslated SDLXLIFF-File and analyse it against an empty TM, we get "wrong" results. Technically the results are correct (0 Matches as the TM is empty), but it does not take the pretranslated TUs into account. That means, the analysis does not reflect the "real world effort".

    My first idea was to create an empty TM, import the SDL-XLIFF file and re-analyse it. That works fine.

    Beside this, you can also go into the "Projects" or "Files" view in Studio, select the projct or a specific file. Check the "Confirmation Statistics". This gives you a detailed view of the situation: How many segments are untranslated / draft / translated / Rejecteed / Approved. You can switch between Percentage, Counts or both together in the view.

    Is this not enough / what you're client wanted?

    Cheers,

    *Stefan.

  • Hi Paul,

    Unknown said:
    A proper application that could use the proper analysis mechanisms would be quite cool I think.

    I thing for a proper analysis using the proper analysis mechanisms you need a TM and you can not fake a TM with fuzzy matches from the SDLXLIFF.

    On the other hand within the SDLXLIFF there is all the information needed to do an approximation i.e

    1. Count the words of the source text (removing any tags , numbers, variables etc.) from the source element

    <source>New unit.<x id="86" /></source>

    2. Add them to the appropriate group based on the percent attribute of the sdl:seg element

    <sdl:seg id="23" conf="Translated" origin="tm" percent="100">

    as long as they are not repetitions i.e. check if  the sdl:rep element exists

    <sdl:rep id="a0d5eecf-b918-4c72-b4f6-7aabd7fdcf73-20" />

    Do you thing that an application like that is really needed?

    Regards,
    Costas

  • Unknown said:

    I guess "pretranslated" means, that the SDL-XLIFF file you got has all 100% and CM pretranslated? Or are ther also Fuzzy matches pretranslated?

    First of all: If we get a pretranslated SDLXLIFF-File and analyse it against an empty TM, we get "wrong" results. Technically the results are correct (0 Matches as the TM is empty), but it does not take the pretranslated TUs into account. That means, the analysis does not reflect the "real world effort".

    My first idea was to create an empty TM, import the SDL-XLIFF file and re-analyse it. That works fine.

    Hi Stefan,

    Yep... all kinds of matches similar to the matches in the spreadsheet... actually I just changed the text.  Using the TM approach doesn't work at all because you only get CM, 100% and New.  You don't get the fuzzies, and of course the CM and 100% may not have been this in the sdlxliff.

    Unknown said:

    Beside this, you can also go into the "Projects" or "Files" view in Studio, select the projct or a specific file. Check the "Confirmation Statistics". This gives you a detailed view of the situation: How many segments are untranslated / draft / translated / Rejecteed / Approved. You can switch between Percentage, Counts or both together in the view.

    Is this not enough / what you're client wanted?

    This is also not helpful.  Remember, I only have the sdlxliff so there has been no analysis carried out on this for Studio to generate this information.  The user has been given a pretranslated sdlxliff file and been asked to finish it off.  There is also no analysis included, only a report that contains a count for payment.  I know the user should really have a discussion with the client because this is hardly fair, but I thought that if there was a quick way to extract this informaiton from the sdlxliff then you have the basis for a discussion if it's really needed.

    This isn't the first time I have come across this scenario which is why I put together the spreadsheet, but it would be better to be able to set the bands properly and summarise the content neatly in the same way we do in the analyse reports... albeit not by running a analysis, rather reading in an already pretranslated sdlxliff.

    Cheers

    Paul

  • Costas Nadalis said:

    Do you thing that an application like that is really needed?

    Hi Costas,

    You're picking all things I was thinking (but don't know what they're called ;-))  I don't think it would be an application that would see an enormous number of downloads, but I do think it's quite a useful tool to have access to... maybe we'd be surprised.  I quite like the ability to extract information like this from an sdlxliff and maybe there is more informaiton I haven't thought about that would be useful here too?

    Regards

    Paul

  • to be able to re-produce the statistical analysis ideally you would need to know the source of translation that was associated with segment that is present in the SDLXLIFF file.


    I fear that to dispute the % that was calculated and associated with pre-translated segment you would need to know the source of segment otherwise you are simply taking the % as they are presented and counting words & chars.


    The only solid information in this scenario would be if there were 100% (or greater) matches included with the pre-translation; then at least from that you could create a mock TM to base any sort of re-analysis.


    source is the key Stick out tongue sounds like something out of star wars

  • Hi Patrick,

    More good points.  But the dispute here is that the analysis available to the user for their payment is not based on a Studio analysis... at least it may not be.  So as a starting point if you were able to easily represent the values of the information in the sdlxliff you already have, because you will be ignoring CM and 100% matches "maybe", and only doing the rest, then you at least have the basis of a dispute (if you actually want to continue working for someone who works this way in the first place).

    Without this you can only say you have a feeling this is wrong and you would like to have more information, like a proper analysis report or the TM used to prepare the file etc.  If it was me I would do something like the speadsheet first so I could say "Look, I have an an analysis of the file you sent me and it doesn't match what you are prepared to pay.  So I think there is something amiss.  Can we review this please."  This would be better than having no ammunition at all other than a feeling it is wrong.

    Maybe it's just me?

    Paul

  • yeah; good point; I guess this makes sense, if you can do some basic association on the information already present; if the numbers don't match up at all to the analysis, then it would be the basis of an inquiry as you suggested; sounds good Yes

    However,  whoever is being put in a situation like this is not being treated very well by the client in my opinion; but I guess that is not the topic for this thread

  • Unknown said:

    However,  whoever is being put in a situation like this is not being treated very well by the client in my opinion; but I guess that is not the topic for this thread

     
    I totally agree.  It seems many small agencies, maybe using Freelance rather than Professional, just send out sdlxliff files on their own after leveraging on their own TMs.  So if you work with people like this I guess it's a good sanity check at least.
     
    But like I said... probably (hopefully) not an app for major download numbers, but a nice to have in your armoury.
     
    Cheers
     
    Paul
  • So, in a nutshell you need a tool, that reads in an existing SDL-XLIFF file and simply counts all existing translation units like this:

    CM: 523 units

    100%: 89 Units

    95-99%: 23 Units

    85-94%: 12 Units

    75-84%: 346 Units

    50-74%: 612 Units

    No Match: 2436 Units.

    That should be easy.

  • Hi Paul,

    If no one else is interested I can add it in the next version of the SDLXliff2TMX together with the vanilla xliff export.

    Regards,

    Costas