IGNORING tables of numbers in analysis

We use Trados Studio 2017.  We have some larger PDFs that include a lot of part numbers, measurements and tables of other numbers. How can we analyze these PDFs and have Trados exclude or ignore the numbers?

Parents
  • Hi Bobby,

    This depends on whether the PDFs are fully accessible texts to start with. Some content can be graphics-based when it looks like text, as you may be aware.

    When a PDF is added for translation in Studio, it is converted to a Word file then an SDLXLIFF. The finished translation will be a Word file and that can be saved as a PDF again if required.

    It may not look like the original PDF. This may be partly because Studio is first of all a translation tool and it can not always convert a PDF as completely as a dedicated OCR tool that is designed specifically to convert PDFs to text.

    Regarding numerical content, I cannot remember if it is possible to customise analysis to exclude numbers specifically; it's a long time since I learned about the complexities of the project management side of things. However, the repetitions count will include all numbers and alphanumeric combinations such as part numbers, etc., after the first instance in the same way as it would any other identical translation units. Thus, should you so wish, you can subtract all repetitions from the total word count.

    Hopefully someone from SDL can help you more precisely on this.

    All the beat,

    Alison

Reply
  • Hi Bobby,

    This depends on whether the PDFs are fully accessible texts to start with. Some content can be graphics-based when it looks like text, as you may be aware.

    When a PDF is added for translation in Studio, it is converted to a Word file then an SDLXLIFF. The finished translation will be a Word file and that can be saved as a PDF again if required.

    It may not look like the original PDF. This may be partly because Studio is first of all a translation tool and it can not always convert a PDF as completely as a dedicated OCR tool that is designed specifically to convert PDFs to text.

    Regarding numerical content, I cannot remember if it is possible to customise analysis to exclude numbers specifically; it's a long time since I learned about the complexities of the project management side of things. However, the repetitions count will include all numbers and alphanumeric combinations such as part numbers, etc., after the first instance in the same way as it would any other identical translation units. Thus, should you so wish, you can subtract all repetitions from the total word count.

    Hopefully someone from SDL can help you more precisely on this.

    All the beat,

    Alison

Children
No Data