Caching of machine translation segments?

Hi there,

We've developed a machine translation plugin for Studio 2011 and 2014.

One behaviour we noticed is that when a user opens a specific segment, it sends it to our MT servers. However, if the user revisits that same segment later, rather than using the previously retrieved translation, it sends it to the MT server again.

The main impact this has is in trying to count the number of words sent to the server. (A side effect is the translation also isn't returned *instantly*)

Is it possible to configure the plugin to make Studio cache the machine translations for already processed segments so that it doesn't send them again?

Thanks

John

Parents Reply
  • Hi folks,

    We've made some progress on this which now throws up a few more questions

    We are modifying our SearchTranslationUnitsMasked method in our TranslationProviderLanguageDirection implementation, which has the "TranslationUnit [ ] translationUnits" parameter. The segment status is correctly retrieved using the ConfirmationLevel property for each TranslationUnit, but there are two problems:

    1 - We are not able to retrieve the Origin value because the "Origin" property is always set to "Unknown".
    2 - We cannot find a way to retrieve the translation result that has been previously returned (we'd like to show the previous translation in the translation results box instead of "No matches found").

    Any ideas!?
    Thanks in advance!
Children
  • Hi John,

    You need to get a handle on the TranslationOrigin; Include this code in one of the SearchTranslationUnit methods...

     MessageBox.Show(
                    "ConfirmationLevel:\t\t" + translationUnit.DocumentSegmentPair.Properties.ConfirmationLevel.ToString()
                   + "\r\nIsLocked:\t\t\t" + translationUnit.DocumentSegmentPair.Properties.IsLocked.ToString()
                   + "\r\nOriginType:\t\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.OriginType.ToString()
                   + "\r\nOriginSystem:\t\t" + (translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.OriginSystem != null ? translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.OriginSystem : string.Empty)
                   + "\r\nIsRepeated:\t\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.IsRepeated.ToString()
                   + "\r\nIsStructureContextMatch:\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.IsStructureContextMatch.ToString()
                   + "\r\nTextContextMatchLevel:\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.TextContextMatchLevel.ToString()
                   + "\r\nMatchPercent:\t\t" + translationUnit.DocumentSegmentPair.Properties.TranslationOrigin.MatchPercent
                   + "\r\nConfirmationLevel:\t\t" + translationUnit.DocumentSegmentPair.Properties.ConfirmationLevel.ToString()
                   );

    Example:

    ciao,

    P.

  • Looks like Patrick H. covered most of the answer. As for getting the previously retrieved translation, one possibility could be the target text of the current segment, which I think would work but haven't tested it. Although, that wouldn't give the same exact string as the previous result if the user has edited it in the target edit box. The only other way that comes to mind would be to implement some kind of caching. One way would be to use the SDL TranslationMemory library and store them in TMs, or to keep it simple you could just store them in some data structure like a hashtable, possibly combined with file I/O for persistence between work sessions.

    I've thought about implementing this sort of thing in my plugin, but decided it would just be simpler to quit the search on certain confirmation levels and pass a message to indicate it was canceled. Not the most elegant solution, but definitely simpler.
  • Thanks Patricks P and H. I guess we can tell if the segment has been edited or not based on the state, but if it has, then maybe we need to just store the MT for these segments.

    One quick follow up question from Patrick H's response related to the DocumentSegmentPair property. We're using this in the Studio 14 API but can't find it in 2011. Is there another way to collect the same information in 2011?
  • Hi John,

    Just reading this now...

    from memory... I also had some problems with this becuase the ISegment isn't exposed, correct?

    I will check/test later on but you could try to initialize the translation unit with an additional SearchResult class and take some of the properties from the ScoringResults (I think that is what it is called)... there should be a few properties in there that resemble in part what is maintained with the TransaltionOrigin...

    I will try to follow up later on if i get a chance to look at some code and see if what I am saying here makes sense /or works

    P.

  • Checked this, this morning and I can safely say that what I suggested regarding -> recovering this type of information by initializing an additional SearchResults with an existing TranslationUnit can be ignored. It was worth a try, but no cigar :-)

    I also took the opportunity to review some code for one of the plugins that I released that supports Studio 2009/2011; for these releases, the only checks that I had in place to confirm if a segment had already been translated were against the parameters (ConfirmationLevel and the TargetSegment itself).
  • Thanks so much Patrick, I guess for 2011 we'll just use the ConfirmationLevel as a more coarse solution than we can do in 2014.

    This begs another question though - how do we maintain multiple versions of the same plugin but for different versions of Studio (this is what I think we'll need in this case, as we're making use of parameters in 2014 that don't exist in 2011). Perhaps Paul can answer?

    (P.S. how do I tag someone in a post? like Paul in this case?)

  • Easy way (I think) is to search for a user in the search box at the top, and when you see the person show up click on them and you get the url to their name.  Paul Filkin or John Tinsley for example.

    On your actual question I think you have two options.

    1. This is the preferred option in my opinion.  You create an installer that contains appropriate code for each version.  If the user has multiple versions installed then you ask which one or just do them all.  Otherwise you see which version is there and install the appropriate stuff.
    2. Create a new app altogether so you have different versions on the OpenExchange

    I think the second is less preferable because it means the users could get confused if they don't notice this, and your downloads get split up so they are not cumulative... cumulative is better so you climb the download charts more eficiently.  This is useful because some people browse the apps by download to see what's most popular.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thanks Paul. We'll have a look at how to implement the first option you've proposed.

    John

  • Folk, just a follow up on this, from a UX perspective, what do you suggest might be the best way to display this information to the end-user?

    We can't edit the message displayed so we must either leave it as "No matches found" (which doesn't really give the full picture) or actually display some output wherein the target segment is actually a custom message (but effectively is an actionable segment)