Alignment of HTML with localized URLs results in TM segments losing the hyperlink tags in target segment

The title is a bit misleading, but I'm not able to express the weirdness in more meaningful way :-\.
Either I'm missing some fundamental concept, or there is something weird happening in Studio Alignment (or in Studio in general)...

I need to align a MadCap Flare project containing ~1200 files in total (1100+ HTMLs and the rest MadCap internal files... all of which normally ARE localizable, but Studio built-in filetype ignores them... probably yet another unfinished work :( ).
It's an online help, i.e. contains zillions of hyperlinks. The hyperlinks are localized, i.e. different in each language.

And I have found that TMs created by aligning such files are somehow 'losing' the hyperlinks in TM lookup window.

The problem can be easily simulated on trivial HTML files:

<html>
<head></head>
<body>
Click <a href="http://www.example.com/en-us/"> to continue.
</body>
</html>
<html>
<head></head>
<body>
Klicken Sie <a href="http://www.example.com/de-de/"> um fortzufahren.
</body>
</html>

Aligning these two files using an empty TM with default settings results in this (no surprise here, all as expected):

Saving the alignment as SDLXLIFF and examining the SDLXLIFF content shows that hyperlink tags are present in both source and target (again, no suprise here... file was re-formatted and some tags folded for better visibility):
The only potential "issue" here are the different IDs of the <g> element, see below...

Importing this SDLXLIFF in the empty TM results in this content of the TM... again showing the different IDs:

And trying to translate the original source file using that TM (and Alignment penalty set to 0, for the information completeness) shows this - the hyperlink tags are completely missing in the TM lookup pane and are NOT inserted in the translation!

Now this is a BIG issue here since the intention is to automatically pre-translate files using the TM automatically (using 0 Alignment penalty, thus e.g. expecting ALL source files originally used for alignment being fully translated from the TM).

What is wrong here?!
I kind of understand the reason for different IDs - the content of the tags is not identical, so they get different IDs - but how am I supposed to approach the task then?

If I let extract the "href" attribute content for translation, I will get huuuuge amount of extra wordcount... and no one is going to pay for that extra "translation"!
Acronyms auto-substitution does nto work for URLs either...

  • No one?
    I don't really expect any translators to comment on it, but I would expect something clever from some SDL person...

    EDIT:

    Even extracting "href" attribute doesn't help much... while this allows to localize the actual hyperlinks, the problem with translations of the segment text not being applied from TM (and losing the hyperlink tags after applying it manually) still persists! :(

  • Looks like a bug Evzen.  I haven't seen this before so will report it to make sure it is known.  I can get a 100% match if I edit the tags in an exported TMX, but I don't know why it's happening.

  • The problem is that I need this bug fixed NOW. Client is expecting a solution NOW. My managers want a working solution NOW. Not "sometimes when a product management condescends to let it fixed" :-\
  • Oh and BTW, there is another very similar bug.

    If source file contains the following element

    <foo attrib="bar" />

    and target file contains following element

    <foo attrib="bar"/>

    (i.e. identical, just without the space before slash), the elements get again different IDs during alignment.
    And the consequence is again the same - TM match not being applied, tag missing after applying the translaiton.

    Not mentioning the problems with segmentation I described in other thread...
    That's exactly why I get mad about SDL implementing fancy bells and whistles with ridiculous marketing names instead of fixing serious problems in elementary functionality of "industry standard" tool :(

  • Both are caused because the attributes are different. If you translated the files then the tags would be exactly the same in source and target and everything would be fine. But because you are aligning tags that don't match it causes this problem. So if you had this in source for example:

    <body>Click <a href="http://www.example.com/en-us/">here</a> to continue.</body>

    And this in target:

    <body>Klicken Sie <a href="http://www.example.com/en-us/">hier</a> um fortzufahren.</body>

    Then it'll give you the desired result. Same goes for this example:

    <foo attrib="bar" />

    You don't even have a tricky workaround because if it worked and they were placeholders in the TM, then when you translated the source you would of course get the same attributes that were in the source and not the ones that were aligned from the target file. This is really not a simple thing to resolve as it touches on some of the fundamental benefits of using Studio which is the ability to use placeholder tags to improve leverage from the contents of your TM. Unfortunately in your case where you are aligning non-matching tags it's working against you.

    Having said all this I'm not even sure if this is a bug anymore. It's a real grey area for me because you are asking the software to behave in a completely different way to the way it's designed. I have left it with support/development but I'm afraid it won't get fixed NOW.

    I am interested in your thoughts on this Evzen, hopefully you'll see the problem.

  • Paul, my point is that it's 2017, so I'm surely not the first person on this planet encountering this problem... websites localization is around for quite a few decades and there are thousands of sites around using localized links... so there simply MUST be a way to achieve such trivial thing as align two HTMLs with localized links and get a TM content which can produce translated file identical to the file used to create the TM.
    That's a very simple requirement, just following simple common sense.

    What I'm asking from a software claiming to be the industry leader is to allow me in year 2017 to do this. Or, to be precise, it's actually the clients and my managers who expect this to be possible... because hey, why would such simple thing not be possible, right?!
    And honestly, I'm sick and tired of trying to explain to all of them that such simple and trivial thing is not possible using a software they paid tens of thousands Euro for... and I'm even more sick and tired of trying to explain them that it's not my fault.

    What I'm asking from the software is to be smart enough in alignment process and produce exactly the same TM content which I get by manually translating the file using the same file type settings, like this:
    (I have translated the file using one Studio project and updated the TM... and then created new project and used the previously created TM in it)

    Generated target file:

    <html>
    <head><META http-equiv="content-type" content="text/html; charset=utf-8">
    </head>
    <body>Klicken Sie <a href="http://www.example.com/de-de/">hier</a> um fortzufahren.</body>
    </html>

    And TM content:

    All perfectly working as expected, still following "the way it's designed".

    What I'm expecting from a company claiming to be industry leader is to put this elementary functionality of its expensive software to proper shape in the first place and ONLY THEN start introducing fancy half-useless bells and whistles.
    And I truly believe that it's pretty sensible expectation.

     

    And regarding the <foo attrib="bar" /> you're not correct - syntax with and without space before the slash does NOT mean two different tags.
    HTML definition contains specification for empty elements: https://www.w3.org/TR/REC-xml/#sec-starttags (scroll a few lines down to "Tags for Empty Elements" part) - the whitespace between attribute and "/>" is allowed, i.e. tag with or without space is identical tag.
    The same goes for XML specification: https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-STag

  • Evzen, I'm not arguing with you here.  I'm only questioning the problem in Studio because of the way Studio works and wanted to hear your thoughts on this because it all helps to raise the profile of this issue in development.

    I tested it in WinAlign and get a 100% match (less alignment penalty here):

    But look at the tags.  These are incorrect since you end up with the placeholder problem I mentioned earlier and the target is translated with the contents of the source tag.  This I think is the correct behavior based on how Studio works.  You can of course resolve the content of the target file like this by changing the href attribute to translatable and then handle the files during translation:

    But I'm sure you were aware of this already and probably use this option for your website translation.  So perhaps WinAlign or LFAligner is the way to go for now until we do something about the way the Studio aligner handles these things.

    Neither I, nor anyone else would disagree that the Alignment tools needs addressing in many ways.  It's just not managed to get a high enough priority with other things going on.  So to help this we really need to see more constructive input from our users on the importance of this feature for more than a few.  If you look at the ideas for example.  There are half a dozen or so ideas raised on this feature and the highest number of votes for any one is 8.  And that's only for a feature in alignment... doesn't come close to addressing the more fundamental issue you and I both know are in this tool:

    https://community.sdl.com/solutions/language/translationproductivity/translation-productivity-ideas/i/trados-studio-ideas/diconnect-several-segments-at-the-same-time-during-alignment

    When we sit down and prioritise the things we have to do the Alignment feature always makes it on the list.  In fact when the list is prepared it's often right at the top.  But after going through the scoring we use to make sure we can address as many things as possible with the time/resources available it always drops down and doesn't get addressed.  8 votes from our users doesn't help!

    I know we get rational and often heated posts on the alignment tool, but again 8 votes doesn't help.  If it was the most voted for idea because people wanted an improved alignment editor then that would help us focus where you need it.  But it doesn't seem to be.  We'll continue to make sure the alignment editor is on the list but if it's really so important for our user base then it would be good to see the votes!

  • Paul Filkin said:
    I'm only questioning the problem in Studio because of the way Studio works and wanted to hear your thoughts on this because it all helps to raise the profile of this issue in development.

    As mentioned in the previous reply - I expect Studio to produce exactly the same SDLXLIFF (and TM) using alignment, which it will produce using standard translation process.

    Paul Filkin said:
    So perhaps WinAlign or LFAligner is the way to go for now until we do something about the way the Studio aligner handles these things.

    Not really... While WinAlign might give good results using this simple file, the situation is different with the real-life original MadCap Flare files. These are not really simple HTML, but in fact XML containing 'almost-HTML' content plus MadCap-specific extensions. And this, together with completely different segmentation used by Trados 2007 (i.e. also WinAlign), results in content segmented differently than in Studio, inline tags being internally represented differently than in Studio, etc. So the result is not usable anyway.

    Paul Filkin said:
    If you look at the ideas for example.  There are half a dozen or so ideas raised on this feature and the highest number of votes for any one is 8.  And that's only for a feature in alignment...

    Not true. Rework of the Alignment tool idea has 21 votes. Plus, you need to SUM ALL alignment ideas together OF COURSE!!! You can't be serious that you are looking at them separately one by one...
    The problem is that the ideas thing is again tragically unfinished - it doesn't have any categorization or usable filtering where one could group similar/same ideas together and see the bigger picture! There is lot of duplicates in fact, because no user is going to spend hours by reading through the looong list to see if the idea (s)he is going to enter is perhaps already there!
    So the person doing the ideas assesment has to see things in wider context, not just export an "idea - number of votes" table, sort it by the second column and that's it...

    Paul Filkin said:
    When we sit down and prioritise the things we have to do the Alignment feature always makes it on the list.  In fact when the list is prepared it's often right at the top.  But after going through the scoring we use to make sure we can address as many things as possible with the time/resources available it always drops down and doesn't get addressed.  8 votes from our users doesn't help!

    doh... how many votes does an idea need then to be seriously considered?
    The following two identical ideas (see? didn't I mentioned it above?) have more votes than the highest-voted individual idea - does that mean that we will get 64-bit Studio in next update?!

    https://community.sdl.com/solutions/language/translationproductivity/translation-productivity-ideas/i/trados-studio-ideas/64-bit-version-of-studio

    https://community.sdl.com/solutions/language/translationproductivity/translation-productivity-ideas/i/trados-studio-ideas/sdl-trados-studio-64-bit-release

    Besides, basing the priority on the absolute number of user votes is, well, pretty silly... simply because the spectrum of users coming to the forum is very narrow, not mentioning further aspects like kind of users actually making suggestions (and thus further 'distorting' the kind of suggestions).
    For example, I would bet that the number of "how do I start translation" noobs and users really 'fully' using Studio features is like 9:1 (where the "9" basically represents "laic translators" and only the "1" represents "power users" like engineers)... so the type of ideas you get is very biased (or what's the proper English expression).
    Plus, the number of skilled people like engineers in the forums is VERY limited by the fact that they don't have their own SDL account, since they work for agencies, where the account is owned by some manager and they never get the credentials. So again, by this pretty stupid limitation you lose opportunity to get important feedback from MANY people using the Pro features and/or using Studio for more advanced tasks than some trivial Word translation.

  • ,

    I do wish you'd stop arguing about everything. It's not helpful at all and I don't bother reading your posts properly as they're too long and too irritating.

    Yes, that one is 21 votes... not sure how I missed that. Or maybe it just received a bunch of votes. The point I was making is that even with 20 or 30 votes out of our entire userbase this is a drop in the ocean. The ideas is a part of the overall planning and I'm just trying to point out that the more you, as users, make use of these tools the more likely we are to pay attention to it. The decision on what gets done is not mine. I have a say in it, along with many others. So I'm trying to tell you what would help the case. Don't waste your time or mine arguing about it.

    If I was the sole voice behind what was worked on next I would have the alignment tool right at the top of my list. But I'm not. Don't respond to this with another argument... go and find some users who feel like you and vote it up!

    Regards

    Paul
  • Paul Filkin said:
    I do wish you'd stop arguing about everything. It's not helpful at all and I don't bother reading your posts properly as they're too long and too irritating.

    And I wish you realize that some things ARE more complex and cannot be discussed using single 5-word sentence.

    Paul Filkin said:
    Yes, that one is 21 votes... not sure how I missed that. Or maybe it just received a bunch of votes. The point I was making is that even with 20 or 30 votes out of our entire userbase this is a drop in the ocean.
    Okay, then given the fact that most voted idea has 37 votes at this moment, it means that all ideas are about the same drop in the ocean, right?

    One cannot compare it to the entire userbase, of course... it should be rather compared to "active forum users" or something similar, where "active" would mean something like "coming to forum at least twice a week in the last 30-day window" (since one-time visitors asking single question, then reading the answers and never coming back are just distorting the figures). Or even look at which particular user either brought up the idea or voted/commented for it. That would IMO give way more relevant picture.

    Paul Filkin said:
    If I was the sole voice behind what was worked on next I would have the alignment tool right at the top of my list.

    Come on, you should not take things too personally... I never said it's your fault or something... You are our, users', "door" to SDL, so we are here expressing our thoughts to you in a belief that you will pass this voice further, that's all there is. And when you (not me) brought up the thing with ideas and votes, I just told you my opinion, that's it.

  • Hello Paul,
    Hello Evzen

    I just would like to express my view on this topic.
    Yes alignment is an important issue, but not only for itself. In my mind it's a really big issue because retrofit isn't reliable at all.
    Retrofit doesn't work if you
    merge segments
    split segments
    overrule paragraph endings
    if there are macro-functions in your document (automatic table, foot-notes, numbers etc.)
    and so on.
    This means, if you want a clean master tm, you have to do a lot of alignment. The alignment function in Studio is all but reliable and all but user-friendly (you get sea-sick - with all these waves ;-) )
    Therefore we ended up doing all alignments with AlignFactory from Terminotix - a reliable and user-friendly tool, which does the trick quickly, in a reliable way and without giving you nausea.
    Therefore I think you have to consider functions in Studio not only one by one, but by there importance in the process-chain and the interaction in the CAT-processes.
    Best regards
    Martha Ebermann
  • Thanks Marta, and Evzen.

    None of this is new information and I am just trying to suggest a way in which your voices can be heard more. It's up to you whether you use it or not.

    I'll make sure your posts are seen if they haven't been read already, but I'd love to see more of our users taking advantage of the tools they have. How much difference it would make to our case if even 1% of our users voted for this.
  • IMO it's kind of catch 22... users would use the voting more if they actually see their voices being heard - i.e. a) something being actually implemented... AND the fact that it was implemented based on the ideas voting being also appropriately advertised!, b) being it implemented in sensible timeframe!

    Plus, it seems to me that the entire thing is just heavily overvalued (or overestimated? you know what I mean...) - only a split of fragment of users go to some forum (especially if it's SO difficult to get around it as this one... sorry, but it IS true) and only a fragment of them are bothered to participate (especialyl if they see the catch 22 that nothing will be implemented anyway if there won't be a couple of hundreds (or thousands?) of votes).

    IMO the voting is currently useful only for showing RELATIVE interest in various features/improvements... like e.g. that alignment rework is about as important (or even more if you count all the related/similar ideas) to users as having more comprehensive error messages (which is just being implemented, as opposed to alignment improvements) and that it's way more important to users than embedded content support in bilingual Excel (which has even been already implemented).

    EDIT:
    BTW, I see also some flaws in the voting like e.g. "Choose file order for Analyze Files batch task" - the idea is marked as delivered, but that's clearly not true as the person marking it did not get the point of the idea at all (and even after re-explanation and re-phrasing the idea did not bother to come back and change the status)... Now, what is an average user coming to forum supposed to think and do?

  • Maybe Evzen... but we can try it or just keep discussing it. I'd rather try it.
  • Paul Filkin said:
    Maybe Evzen... but we can try it or just keep discussing it. I'd rather try it.

    I did too - all ideas I see as useful for me got my vote (and I ocasionally do get back to see what's new and vote further). I can't do more for it. I'm not going to run around the town and "recruit" more people to vote, sorry... that's not what I'm paid for and I won't do such service to SDL for free. There are - or at least there should be - people in SDL paid for such things (or at least for having enough knowledge to interpret available data properly and bringing ideas how to get more data telling relevant information).