Challenge with Batch Task API and ISegmentPair.Properties.TranslationOrigin

Dear community,

we would like to create a custom word count batch task. Now we have the challenge that when the custom batch task runs before the SDLXLIFFs have been opened the first time, we get an error “object not set to an instance”, because “segmentPair.Properties.TranslationOrigin” is null while we are trying to access its property “OriginType”. However, after the the SDLXLIFFs have been opened once, the custom report is created successfully.

 

public override void TaskComplete()

        {

            foreach (ISegmentPair segmentPair in segmentPairExtractor.segmentPairs)

            {

                if (segmentPair.Properties.TranslationOrigin.OriginType == null)

 

, can you maybe help?

 

Best wishes,

Simon

  • Hi Simon,

    If TranslationOrigin is null, it means the segment is empty and the file has never been opened and saved.
    Why not just check for null and skip those segment pairs?

    if (segmentPair.Properties.TranslationOrigin != null)
    {
    // TranslationOrigin is not null, so check OriginType
    }

    Or with C# 6 Syntax

    if (segmentPair.Properties?.TranslationOrigin.OriginType == null)

  • Hi Jesse!

    Thanks for your reply!

    Yes, we can catch that exception and return figure "0" when the Property is empty. However, then our report is not correct, because we then have "0 words in repetitions" for the SDLXLIFF file in the report, while we actually have "29 words in repetitions" - which is also correctly captured by the batch task after first time the SDLXLIFF has been opened and saved.

    Best wishes,
    Simon

  • Hi Simon,

    I'm not understanding why you need to catch an exception. Why not check for null first and only read TranslationOrigin if it is not null?

    I guess if you wanted to, you could assign default TranslationOrigin instances yourself so you can skip the null checks?

    Sample code:

           private readonly IDocumentItemFactory itemFactory = DefaultDocumentItemFactory.CreateInstance();

          ...

           public override void ProcessParagraphUnit(IParagraphUnit paragraphUnit)

           {

               if (paragraphUnit.IsStructure)

               {

                   return;

               }

               foreach (ISegmentPair item in paragraphUnit.SegmentPairs)

               {

                   if (item.Properties.TranslationOrigin == null)

                   {

                       item.Properties.TranslationOrigin = itemFactory.CreateTranslationOrigin();

                   }

               }

           }

  • Hi Jesse!

    Thanks a lot for your proposal!

    We have tested this. However, when we CreateTranslationOrigin() ourselves, than the .IsRepeated property is always "false". Again, only after opening the SDLXLIFF file and saving it, .IsRepeated is "true" and we can read from segmentPair.Properties.TranslationOrigin.RepetitionTableId

    Best wishes,
    Simon

  • Hi Simon,

    To better understand this issue, I have created a custom batch task plugin that attempts to simulate the issue that you are describing here, for the purpose of debugging; you can download the complete project from here: CustomWordCounter.zip.

    Note: I have built it against Studio 2015 assemblies, however, if you are working with Studio 2017, then simply update the assembly references to point to the Studio 2017 installation directory.

    The sample includes a simplified example of integrating a bilingual content processor that iterates over the segments of a file, using the custom batch task API; it recovers the segment properties (including the translation origin) + some intrinsic code to create 'n query against a temporary TM (recovering the word counts) + some helpers to configure the origin type.

     

    Observations

    I am trying to ascertain what information I can provide to you, given the details you have provided thus far, but I'm unsure if you are attempting to recover attributes from an SDLXLIFF file that don't exist or if it relates to a bug that we have not discovered yet.  It would probably be better if we could organize a quick sync via skype to try 'n answer all of your questions after you have taken a look a the CustomWordCounter project that I have attached.

    ...with regards expectations.  When you integrate a custom batch task and iterate over the segments through a content processor; it will do exactly as you request 'n recover the properties of the segment relevant to its current state.  If the SDLXLIFF file was not previously translated or include attributes that you would expect to see, then they will not spontaneously appear.

    It seems that you are trying to simulate functionality very similar to that of the batch task “Analyze files”, if this is the case, then it might be easier to simply setup the batch task to run the analyze files and then parse the details from the report that is generated.

    It would help a great deal if you could describe the work flow and the state of project/sdlxliff files to better understand how to assist here.

     

    Patrick Andrew

  • Hi Patrick and Simon,

    I'm not sure if you can consider this a bug, but just to make sure everyone is clear what is going on here.
    The picture is the same exact sdlxliff file, showing a BEFORE (left-hand side) and AFTER (right-hand side).

    As you can see in the AFTER picture, the sdlxliff file contains the rep-defs (repetition table) while the BEFORE picture does not.

    This is very easy to produce.

    1. Create a new project in Studio and add any file.
    2. The sdlxliff file created will be your BEFORE file.
    3. Now open the file and make any modification (like adding a space or something) and save the file.
    4. The saved filed at step 3 is your AFTER file.

    Long story short, the rep-defs table is built the very first time the sdlxliff file is saved.

  • Hi,

    I think this is by design. From what I know (but might be incorrect) Studio calculates repetitions/AutoPropagate when you first open a file and builds a repetition table. When you then save the file, this gets persisted into the SDLXLIFF I guess.

    Thanks, Daniel
  • Hi Simon,

    Thank you Jesse for following up; no.1

    I have been able to reproduce the issue as you have described it above; please review the latest version of the example project here: 3731.CustomWordCounter_v2.zip

    I conferred with  on this matter this morning + as  has outlined, the repetitions table that you are making reference to is in fact created from the studio editor, as a convenience at that level. It is possible to read that information through the translation origin model while iterating over the segments from a content processor, however, that information would first need to be generated by the editor; not possible to generate that table directly from the project API.

    It is however possible to create your own table and manage it as you are iterating over the segments, by generating a hash-code of the source segment and comparing against that to identify repetitions.

    A more elegant solution might be to recover the analysis statistical information from the ProjectFile itself that is accessible through the batch task API.

     

    However, you would need to ensure that an ‘Analyze Files’ batch task is executed at least once prior to running your custom batch task; considering that the AnalysisStatistics data is recovered from the project only after an analysis operation has taken place on that file.  To accomplish this, you could add a custom task sequence, as follows:

    • Select Batch Tasks>Custom>Task Sequences
    • From the Task sequences dialog, select ‘Add’
    • Provide an appropriate name + description and then add the ‘Analyze Files’ task + your custom batch task
    • Click ‘OK’ and save the task sequence details

    Note: batch tasks are designed to chain tasks together, so it would not make sense to call 'Analyze Files' directly from your custom batch task; instead chain the tasks in a sequence of events, so to speak.

    Refer to the screen shot underneath

     

    The new Task Sequence would then be available from the batch tasks menu, as follows:

     

     

    Would this type of solution suit your requirements?

    Patrick Andrew

  • Hello, Thanks for the excellent answers already.

    If I may; I have two follow up questions regarding the proposed solutions.

    1. "Create your own table and manage it as you are iterating over the segments, by generating a hash-code of the source segment"

    This would work perfectly well, but we've discovered a case where repetitions counts can differ from the Analyse Files report if we do it this way.

    This has to do with "placeholders" that can be found in the source segment depending on the configuration of the TM. For example a TM can be configured to detect numbers / dates / etc. This will affect how repetitions are counted since two segments can be the same except for the numbers in them.

    In the batch task when iterating over the "ISegmentPair" object, this information does not seem to be available.

    Have we looked over it and missed where information about placeholder is stored in this case? 

    Can you suggest an approach for retrieving the placeholders? Could is be done by querying a TM ?

     

    2. "A more elegant solution might be to recover the analysis statistical information from the ProjectFile"

    If I'm not mistaken this will only give us the total repetitions counts and not on a segment level. We require segment level since we want to iterate over each segment in order to assign it to our own list of categories (of which repetitions is one). So this solution does not seem fit in our use case.

     

    Regards,

    Koen