New features in TransTools 3.3
If you have to translate Word documents converted from PDFs or scanned images, you know all too well that this process is complicated. Documents produced by OCR and PDF conversion programs have many problems:
- when you import them into a CAT tool, you may often see excessive tags which slow you down and reduce the accuracy of TM matches (see below)
- such documents are difficult to format and translate because of the advanced formatting like numerous headers and footers, section and column breaks, textboxes, invisible paragraphs, etc.
Document Cleaner is an advanced formatting tool that can help you get rid of excessive tags and format such documents with ease. In TransTools v. 3.3, Document Cleaner has been changed significantly to make it easier to use and help you handle many common issues with documents produced from PDFs or scanned images.
Tag cleaning functionality
The key strength of Document Cleaner is its ability to reformat Word documents in subtle ways in order to reduce the number of tags in CAT tools. In the latest version, all tag-cleaning commands are grouped under the first tab called “Tag Cleaner”.
Tag Cleaner offers a number of options which make various changes to the document. Most of these changes do not impact the appearance of the document – the document will look the same. However, when you import the modified document into your CAT tool, you will see much fewer tags.
The following tag-cleaning options were added in the latest version of TransTools:
- Remove excessive bookmarks – this option removes some document bookmarks which do not need to be preserved in the translated document. Inside CAT tools, bookmarks are shown as pairs of tags, so it is a good idea to remove bookmarks that you do not need.
- Accept tracked changes and switch off change tracking – this option accepts all tracked changes (revisions) in the current document and turns off change tracking. Some CAT tools may display numerous excessive tags if you import a document with tracked changes in it (e.g., memoQ does for some documents).
Most of the time, the default options are sufficient. To clean up the current document, just click [Clean tags] button and you can now import the document into your CAT tool and enjoy the translation process.
Besides the tag-cleaning functionality, Document Cleaner includes several tools to help you prepare poorly formatted documents. These tools are grouped under two tabs: Autoformat and Other Tools.
Autoformat tool allows you to format documents produced by OCR and PDF conversion tools or identify potential problems in them. Just check off the appropriate options and click the button. Here are some of the things it can do:
- Highlight paragraphs that contain text formatted with several different font sizes or fonts. When OCR tools process scanned documents, you can often see “jumping” font sizes in the same paragraph, with 9 pt text followed by 9.5 pt text, etc. Font and font size variations cause tags in your CAT tool and may look unprofessional. Once Autoformat tool highlights these paragraphs with a predefined color, it will take you little time to correct the problems.
- Highlight tab characters between words. If a PDF document contains justified text, the OCR / PDF conversion tool may recognize the longer spaces between words as tab characters, which will cause all sorts of problems during translation. Once all tab characters are highlighted, you can quickly replace the wrong ones with spaces.
- Apply variable row height to table rows. Most OCR and PDF conversion tools will format tables in such a way that some rows will not resize if the text expands or shrinks during translation. Most of the time, they apply a minimum-height setting, which means that the rows can become larger to accommodate more text, but will not become smaller if there is less text in the row. However, sometimes they may apply a fixed-height setting which will make some text invisible. This Autoformat option will fix this problem.
- Remove frames. Some OCR tools like ABBYY Finereader often put images, tables and text inside a layout element called “frame”. Frames cannot be broken across pages, so they may sometimes cause text to become invisible. This option removes all frames from the document, keeping their contents.
- and many other things
Other Tools is a collection of assorted tools for document formatting. Under this tab, you will see some tools which were included in the previous versions of Document Cleaner, as well as additional commands, for example Quick Actions. For a full list of available tools, go here.
In general, Document Cleaner is quite different from the previous version, although no functionality was removed. If you used Document Cleaner in the past, you can read a summary of the latest changes here.
To test the updated tool, download and install the new version of TransTools.
I hope you have found this information useful. See you in future newsletters. Don't forget to subscribe to TransTools on Facebook, Twitter, Google Plus, etc. using the links on the left.
October 16, 2014