SDL Trados Studio
SDL Trados GroupShare
SDL Trados Business Manager
SDL Trados Live
SDL Speech to Text
SDL Managed Translation - Enterprise
Translation Management Connectors
SDL LiveContent S1000D
SDL Contenta S1000D
SDL Tridion Docs
SDL Tridion Sites
SDL Content Assistant
SDL Machine Translation Cloud
SDL Machine Translation Connectors
SDL Machine Translation Edge
Tridion Docs Developers
SDL User Experience
Language Products - GCS Internal Community
SDL Community Internal Group
SDL Access Customer Portal
SDL Professional Services
SDL Training & Certification
Language Technology Partner Group
SDL Academic Partners
SDL Enterprise Technology Partners
ETUG (European Trados User Group) Public Information
Machine Translation User Group
Nordic SDL Tridion Docs User Group
SDL Tridion UK Meetup
SDL Tridion User Group New England
SDL Tridion West Coast User Group
SDL WorldServer User Group
Tridion Docs Europe & APAC User Group
Tridion User Group Benelux
Tridion User Group Ohio Valley
SDL MultiTerm Ideas
SDL Passolo Ideas
SDL Trados GroupShare Ideas
SDL Trados Studio Ideas
SDL Machine Translation Cloud Ideas
SDL Machine Translation Edge Ideas
SDL Language Cloud TMS Ideas
SDL Language Cloud Terminology Ideas
SDL Language Cloud Online Editor Ideas
SDL Managed Translation - Enterprise Ideas
SDL TMS Ideas
SDL WorldServer Ideas
SDL Tridion Docs Ideas
SDL Tridion Sites Ideas
SDL LiveContent S1000D Ideas
SDL Contenta S1000D
SDL XPP Ideas
Events & Webinars
To SDL Documentation
To SDL Support
What's New in SDL
Detecting language please wait for.......
Hello AllI want Trados Studio to split excel cell contents into segments based on embedded HTML codese.g:
Product FeaturesThrows are acrylic knittedProduct size : 130x170 cmProduct colour is beige. Washing RecommendationsWashable at 30 degrees.Do not bleach.Do not iron.--------
this is a samplesample-br.xlsxthanks
Maybe this is an 'overkill' solution, but I can see that your sample also seems to contain non-HTML columns. And while the Embedded Content solution provided by Paul does the trick to identify HTML tags…
The default rules will handle this and should give you an idea of how to improve it if you wish:
I find this regex-based approach rather amateurish, cumbersome and most importantly failing big time with just a little bit more complex HTML code, not mentioning anything more complicated (containing…
Or you can use more distinctive approach by adding individual rules for the htmł element.
This would be for example
<b> </b> as tag pair with no extra segmentation hint, but with selected feature "Tag acts as word end" (is really VERY important when translating and should be selected by default, dear SDL)
<br> as placeable with the segmentation hint "exclude"
and so on...
I find this regex-based approach rather amateurish, cumbersome and most importantly failing big time with just a little bit more complex HTML code, not mentioning anything more complicated (containing entities, comments, inline scripts, etc.)
Therefore I use a simple script which exports the HTML content to very simple XML structure like this
<string cell="A1">blablabla, some complicated HTML code</string>
This is then easily processed using the XML with HTML embedded content, with all comfort of embedded content parser.
And then it's again very easily injected back into the appropriate places into the original Excel sheet using another rather dumb script which only reads the cell location from the XML element attribute and puts the string in the cell.
Not a method for average Joe Translator, I know...
Why not then simply save Excel as XML and process that in Studio? Not that complicated as it seems...
Not usable with multilingual Excels - where target (or rather multiple targets... like 15 languages or so) is to be placed to other cells than the source.
Indeed. I have rarely to deal with such, sorry - this is why I usually go the easiest way for a freelancer.
Thank you very muchgreat solution
Thank for this solutionI will try it
I will try this alsothank you for sharing ideas
Maybe this is an 'overkill' solution, but I can see that your sample also seems to contain non-HTML columns. And while the Embedded Content solution provided by Paul does the trick to identify HTML tags, it does not recognise HTML character codes (if you should have any on your files).
So, just my two cents:
For such files, we use the XML options in Excel's Developer tab:
1. Use Notepad++ to create a simple file with the names of the Excel columns, which look like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <File xmlns:xsi="">www.w3.org/.../XMLSchema-instance"> <Element> <Column1>a</Column1> <Column2>a</Column2> </Element> <Element> <Column1>a</Column1> <Column2>a</Column2> </Element> </File>
2. Use the Source button in Developer > XML in Excel to add this XML map to the Excel file for translation. Then drag and drop each of the XML elements from the XML map onto the respective Excel column heading (which will create a table).
3. Click Export in Developer > XML to export your table to an XML file.
4. In Studio, create specific XML file type settings. As such you can configure which elements / columns need to be translated (maybe not all Excel columns need to be translated) and you can add document structure information to elements. You can then use this document structure information to have HTML content processed using Studio's embedded "Html Embedded Content 5 220.127.116.11" processor, which will recognise both HTML tags and HTML character codes. And non-HTML columns will not be processed as containing HTML.
5. After translation of the XML file, just open the Excel file and click Developer > XML > Import to import the translated XML file.
Bit of a long process, I know, but once you know how it works, we have found the results to make it worthwhile the effort.
Won't work either for Excel files with multiple languages, of course...
Yes, that is basically where my method with the export/import script originated from.It then just evolved to a smarter script with more functionality... it can e.g. skip cells already containing a translated text... or, as mentioned, handle multiple languages (export/import separate XMLs for each target language).
Hm. Interesting idea.
Wow... I did say the default aproach would help give the user the idea they needed to improve it for the content they have. So adding a few rules here and there for this specific content is trivial.
Evzen Polenka said:I find this regex-based approach rather amateurish, cumbersome and most importantly failing big time with just a little bit more complex HTML code, not mentioning anything more complicated (containing entities, comments, inline scripts, etc.)
Well... in this case the file is not more complex. I'm a firm believer in economy of accuracy and in this case your more professional approach is not needed. Interesting discussion though and for other files it is a sensible way to go given the lack of a better embedded content handler for Excel in Studio.
Paul said:in this case your more professional approach is not needed
This is questionable.I've seen such decisions based on a short sample (or seeing just a few lines of one file, not bothering to look thoroughly through the WHOLE file), making a hell of the people's lives because just a few pages down (or in other file) there was a messy complex HTML code.
So I prefer using robust solutions working reliably in all cases, rather than keeping solving endless issues with simple solutions.
So, as we all seem to agree, it all depends on the contents of the entire file.
And, in reply to aazzoma khateeb: if your file only contains BR and P tags, Paul's solution is definitely the best way to go as it will do the trick perfectly. If, however, the rest of the file also contains many other tags and tag pairs and HTML codes instead of characters, an Excel-to-XML solution may be the better option as you can then profit from Studio's integrated HTML processor.
In any event, no disrespect intended to anyone here from my side...