Detecting language please wait for.......
We have encountered a problem in translating HTML files for a client.
Studio has apparently modified the structure of the HTML files meaning that the client cannot reintegrate them.
For example, one file that originally contained 60 lines in notepad++, contained just 36 in the target file after translation. All of the code is there, but has been grouped into blocks in some sections.
Unfortunately this is the case for several files, and we have translated over 200 in total.
Does anyone have an explanation or solution for this problem?
I can provide an example via PM if needed.
Many thanks for your help!
Hi Clémentine Guillot
If you could post a set of sample files here (source, sdlxliff and target file) then I am sure we could help you.
Many thanks for your email! Here is an example (source, sdlxliff and target).
Let me know if you need anything else.
Thanks for your help!
The structure of the target HTML is not altered, nor is the layout. There is a difference in where CR/LFs ("line breaks") are put, but that does not matter for HTML. Any browser, CMS etc. can use either.
Did your customer TRY to use the HTML you provided? What error exactly occurred when they did? I am asking because line breaks in HTML are really arbitrary and every piece of software that turns out HTML code will do it a bit different, always (one hopes!) with the intent to make the code human-readable.
You can remove all indentions and line breaks and the code is still perfectly valid (and works):
Your customer needs to understand that what you supplied is what they asked for - translations in valid formatting.
Thanks for your quick reply Daniel! I'll explain this to the client and ask them what type of error they had.
They also had another remark saying the code had been changed on line 4 in the attached example. Do you have an explanation for this?
I don't know which file is source and which is target...
I have a hard time believing that Studio changed that. Can you check in your files that this is really a change from source to target in the same file?
Apologies for not being precise enough. You'll find attached the source, target and xliff files. This is the FR>EN translation but we also translated this file into Dutch and the same thing occurred...
The explanation most probably is that from the HTML format standard perspective, the missing part is simply unneeded.
The "DOCTYPE html" definition clearly defines what kind of document it is and which HTML standards it follows, i.e. the "content-type: text/html" additional information is pretty much irrelevant and unneeded.
Target HTML is not generated by Studio itself, but by a certain sub-component which generates the HTML according to appropriate HTML standards... i.e. the generated syntax should be perfectly fine.
The problem I've seen quite often is that clients' systems do not follow international standards (e.g. because they were developed by internal unexperienced programmers) and often rely on things inside the data files which are either unneeded, or even purely incorrect...So it could very well be that this is one of such cases...
NB: The meta tag is not closed in the sample text:
Okay, I was curious enough to run this myself - yes, the Studio changes the meta tag attributes. As Evzen point out, it should not matter. If it does matter for your clients, the easiest is probably to do a batch search and replace, which you can do in Notepad++ for all files at once.
In HTML , the "meta" tag does not need to be closed... just as e.g. "br" tag.
...and that search & replace should be done on the client's side... since it's their own internal problem.
Thank you both for looking into it!! I'll explain all of this to the client.
Ah, right. I always close br tags, but now I am wiser, HTML does not require it.
I looked it up and I think I found the answer for Clémentine Guillot:
Your client sent you HTML 4.01 files and seems to require HTML 4.01 files back. You used the HTML5 file type which will - of course - return HTML 5 files but is tolerant enough to read HTML 4.01 files.
I would call that a curveball. HTML5 has been the standard since 2014.
Oh brilliant, thanks Daniel! I've just checked and Studio 2019 offers a Html 4 184.108.40.206 file type. Do you think it'll solve the problem if we use it for the next projects? Apparently the client "fixed" everything on his side but wants to make sure we can avoid it next time...
Yes, the HTML 4 file type returns HTML 4 files, I tried it with your test file: