Studio 2017 SR1 Build 14.1.6413.8, Windows 10.
I am working on a Word file that has been converted from a pdf.
Repeat sentences are not being recognised. When I use the concordance I get the above - there appear to be hidden tags.
What are they, and is it possible to eliminate them? I very frequently work on converted files, but this is a new experience.
Hi Beverley Wahl
I can't read these tags are because you have them set to 'Tag ID' on your View tab. That replaces the tag text with an ID number.
They are undoubtedly formatting tags created by the process of converting the PDF to Word and there is a simple solution.
Via File > Advanced Save > Save to Source, you can create a source Word document. Open that document in Word and smooth the text out. I find that if I highlight the whole document and change a common property such as language then change it back, unnecessary tags are removed.
Text inside text boxes will not be reached by highlighting the whole document, you'll need to click into them to smooth the text.
There may be a better way to do this but it's how I do it.
Then create a new project using that Word document. There should only be necessary tags visible now.
All the best,
Sorry, but I'm not sure what your starting point is for saving to source. Is it the pdf file or the converted Word file?
I am using Adobe Acrobat Pro, and convert using the option Export to > Microsoft Word > Word document, but don't see any option 'Advanced Save > Save to source > either in Adobe or in Word 10.
Afraid you will have to spell it out for me!
I think Ali may have missed that you are already working with a Word file converted from your PDF. But she is correct that these are almost certainly formatting tags that you are just just displaying in the editor as you work. If you press Ctrl+Shift+H then this will probably display the tags you are not seeing.
What you should have done is clean up the conversion a little more thoroughly to remove all the unnecessary tags that are controlling things like kerning, colour variations, size variation... all as a result of the PDF conversion process. Tools like TransTools can help a lot with this:
Technically they don't matter and you should be able to ignore them in the target translation. But they do make a mess of your TM as you'll be penalised if you ever compare to a cleaner text.
Thanks, Paul. I have now successfully downloaded TransTools and applied it to the problem document where it found thousands of tags!
Have a nice weekend!
Hi Beverley Wahl, hi Paul,
I had missed that it was a Word file already, I thought it was Studio that had converted a PDF and thus suggested creating a source Word file.
Sometimes when software converts a PDF, its text is read as not quite the same font size throughout, so the software automatically adds tags to match the changes between size or even font. Thus if you tidy the Word file of unnecessary tags, it will be much easier.
Years ago when working with Interleaf Quicksilver, I used to tidy the font tags up before converting for translation, using a similar Find & Replace method to that I described. I then found it could be done in Word too.
I'd never checked to see if there was now an app to do it though it is logical that someone somewhere would have thought of it. I'm glad to find that there is, I will pass that info on to the agencies I work for, they'll be pleased!
Hi Ali and Paul,
I thought it worth mentioning that the biggest problem with my document proved to be not font size or type, but colour! I can only assume that because the document was a poor copy, it was all about the fluctuating shades of grey.
When I ran Document Cleaner the first time on the default setting, the program prompted me to check 'Normalize font colour in each paragraph', and that did the trick.
So I've learned a lot! Thank you both.
Hi Beverley Wahl
Yes, that's logical. I'm so glad that the Document Cleaner did the trick - also that Paul alerted me to its existence. I was able to pass that info on to the agency I do most of my freelancing for and it's going to be useful to them.
Learning is pretty good fun, isn't it!
Always glad to help!