* Logo
* Translator Tools
*
* Motto
* *
TransTools suite Knowledge base Feedback Back to Index | Site map
*
*
*
*
**
**
**
*
*
*
Also on:

TransTools on Twitter TransTools on Google Plus TransTools on LinkedIn TransTools on Scoop.It!
*

Tag Cleaner

PowerPoint

This tool is included in TransTools Professional Edition.

Some PowerPoint presentations you may receive for translation are not original documents, but a result of conversion from PDF documents. When imported into your CAT tool, these presentations may contain a lot of formatting tags which complicate the translation process considerably and often make it almost impossible. As it happens, these tags are caused by hard-to-see changes in formatting which are applied by the conversion software.

Screenshot: Tags in memoQ
Example: tags caused by font spacing changes in memoQ (original text: “Good planning”)

Tag Cleaner for PowerPoint is a special tool which re-formats your PowerPoint presentation in order to minimize tags displayed when you import it into your CAT tool.

What is the reason for tags?

When a PDF document is converted into a Microsoft PowerPoint presentation, the PDF conversion software may apply the following formatting in order to match the original format:

  • use wide or narrow text spacing to match the size of letters and inter-character spacing in the original document (see the sample screenshot above)
  • use subscript and superscript formatting (like “2” in H2O or m2) if the text is above or below the text baseline, even when the original document does not have any subscript or superscript text
  • use different formatting for spaces or tabs
  • apply different, but often very similar text colors to various text in the same paragraph
  • use jumping font sizes, e.g. text formatted as ‘10 pt’ next to ‘10.5 pt’ (this is especially true for documents acquired from image-based PDFs)
  • use different fonts in the same paragraph or sentence, while source documents use only one font (this is especially true for documents acquired from image-based PDFs)
  • apply thick underline formatting to text formatted in bold font (e.g., thick underline instead of normal underline)
  • apply different Complex Script font to different text in the same paragraph
  • apply different PowerPoint colors to different text even when the colors look absolutely the same (e.g., by applying black theme color to one part of the text and black RGB color to another part of the text)

The above formatting differences are the primary source of tags in CAT software and make it harder to format a PowerPoint presentation before and after translation.

How to use Tag Cleaner

To run Tag Cleaner, click Tag Cleaner button on Add-Ins ribbon:

Screenshot: Running Tag Cleaner from Ribbon

The following dialogue will appear:

Screenshot: Tag Cleaner dialogue

The Tag Cleaner dialogue provides the following options:

  • Set default text spacing – reset character spacing to default values.

    Character spacing properties include Spacing and Kerning properties available on the Character Spacing tab of the Font dialogue in PowerPoint. These properties are very rarely used during document preparation, but PDF conversion tools use them quite often to create a PowerPoint presentation which looks close to the original PDF.

    Text spacing properties in PowerPoint
  • Remove subscripts and superscripts.

    As you know, subscript and superscript formatting (like “2” in H2O or m2) is used to shift text below or above the text baseline (i.e., the line on which the text stands). Sometimes a PDF conversion tool may think that text is not positioned on the baseline and therefore should be formatted as superscript and subscript text.

    Superscript formatting in PowerPoint
    Superscript formatting in PowerPoint

    This option clears all subscript and superscript formatting from text, eliminating all tags caused by such formatting. Note that you may need to check the presentation and apply subscript and superscript formatting afterwards in accordance with the original document.

    This option is not selected by default.
  • Normalize font colors in each paragraph.

    Quite often, the PDF conversion tool decides that the text in a specific paragraph is formatted in different colors. This often occurs in scanned PDFs which naturally have several shades of black or some other color on a page, but can also occur in presentations authored by humans where text may be formatted with different but similar theme colors. Different text colors in the same paragraph cause tags in your CAT tool. If you select this option, Tag Cleaner will apply the color of the first word in the paragraph to the text in the same paragraph that has a different font color according to one of the following sub-options:
    • Apply first font color to other text if color is similar – If you choose this option, the font color at the beginning of the paragraph will be applied to the other text only if the color of the other text is similar to this color.
    • Apply first font color to all text within paragraph – If you choose this option, the font color at the beginning of the paragraph will be applied to the entire paragraph. This can be useful if you would like to minimize tags as much as possible and re-format the presentation after translation.

    This option is not selected by default.
  • Normalize font size in each paragraph.

    When you convert a PDF, esp. an image-based PDF, the PDF conversion tool may decide that there are several different font sizes in a given paragraph, while in fact there is only one. It can format one range of text as ‘9 pt’, the second – as ‘9.5 pt’, and the third – as ‘9 pt’ again. Because human-authored documents rarely have more than one font size in a given paragraph, you can use this option to level font sizes across the paragraph using the first font size of the paragraph. To use this option, select the appropriate setting from the picklist:
    • Apply first font size to other text if size difference is 10% / 20% / 30% / 40% or smaller – select one of these settings if you want to apply the same font size only if the size of some text is different from the first font size of the paragraph by 10, 20, 30 or 40% or less.

      For example, if you choose the 10% option and the paragraph contains text formatted with 2 different font sizes – 10 and 11 points, Tag Cleaner will apply the font size of 10 pt to the text formatted in 11 pt, because the difference of 1 pt is exactly 10% of 10 pt. However, if the paragraph has text formatted as 10 pt and 12 pt, no changes will be made because 12 pt is different from 10 pt by 20%, which is more than 10%.
    • Apply first font size to all text within paragraph – select this setting in order to apply the first font size to the rest of the paragraph

    This option is not selected by default.
  • Normalize font in each paragraph.

    When you convert an image-based PDF, the OCR or PDF conversion tool may decide that there are several different fonts in a given paragraph, while in fact there is only one. For example, the OCR tool can recognize one range of text as ‘Cambria’ font and another one as ‘Calibri’ font. When you import the document into the CAT tool, this will result in several tags. Because most documents use the same font in a given paragraph (unless a symbol font is used), you can safely format the rest of the paragraph using the same font as used in the beginning of the paragraph. This option does just that.

    This option is not selected by default.
  • Use standard ComplexScript font in the entire presentation.

    Behind the scenes, PowerPoint applies several different fonts to any text to determine how to render characters from East Asian, Arabic and other non-Latin alphabet (also called Complex Scripts by Microsoft). In regular presentations which are prepared manually, these special font properties are not used and so they don't cause any tags in CAT tools. However, some PDF conversion tools actually set these properties when generating PowerPoint presentations, and often text within the same paragraph may have different settings for these properties.

    If you use this option (it is activated by default), Tag Cleaner will apply the same Complex Script font to all text in the presentation, thus eliminating tags caused by changes in Complex Script font settings.

    To use this option, you need to specify the name of the font to be applied for Complex Scripts. In most cases, select “Use PowerPoint default (recommended)” (default). You can also type the name of the font that supports the Complex Script that your presentation uses.
  • Use a single spellchecking language for the entire presentation.

    Some CAT tools such as SDL Trados Studio may display additional tags if text has spelling mistakes (i.e., it is marked with red squiggly lines).

    This option marks all text in the presentation with a specific spellchecking language (or specifies that spelling should not be proofed), which eliminates such tags.

    If the default parameter (“Do not proof spelling”) is selected for this option, PowerPoint will not proof spelling of any text, and you will avoid tags in your CAT tool.

In addition to the above options, even if you deactivate all options, Tag Cleaner will fix some other minor formatting issues which result in tags in CAT tools.

As you select an option, you will see its detailed description below the options area.

To remove a specific type of formatting from the entire active presentation, check the appropriate options and then click Clean tags button.

Recommended settings

The following Tag Cleaner settings are recommended for processing of all PowerPoint presentations:

  1. Set default text spacing
  2. Use standard ComplexScript font in the entire presentation + “Use PowerPoint default (recommended)” suboption
  3. Use a single spellchecking language for the entire presentation + “Do not proof spelling” suboption
Other settings can be activated if you still see excessive tags in your CAT tool and you are confident that no loss of original formatting will occur, or if you intend to restore this formatting afterwards. It is recommended to check the cleaned presentation against the original in case you use such additional options.

If you process a PowerPoint presentation produced by a human, I would advise against using the options which are unchecked by default. In fact, if the presentation was authored by a human from its creation and not joined together from various sources, the Tag Cleaner command should not be used at all as it may remove some intended formatting. Always err on the side of caution in such cases.

It is also recommended to use Unbreaker on the presentation to remove unnecessary paragraph and line breaks, as these often cause additional tags in many popular CAT tools.

Configuration profiles

Depending on the type of your presentation, you may need to use different sets of options. For example, to process documents produced by PDF conversion tools, you will have a set of preferred tag cleaning options; if the document was produced by a human, however, you may need to activate a smaller set of Tag Cleaner options.

Screenshot: Tag Cleaner profiles
The profile area is located at the bottom of Tag Cleaner dialogue.

To save current configuration settings to a new profile, select or deselect various options in Tag Cleaner dialogue, and choose [New profile...] from the list at the bottom of the dialogue. Assign the name to the new profile and click OK.

To save current configuration settings to an existing profile, select or deselect various options in Tag Cleaner dialogue, select the profile from the list at the bottom of the dialogue, and click Save.

To load a specific profile, select it from the list at the bottom of the dialogue and click Load.

The Default profile is loaded automatically when Tag Cleaner dialogue is opened.

To remove a specific profile, select it from the list at the bottom of the dialogue and click Remove. The Default profile cannot be removed.

Developed by Stanislav Okhvat, 2007–2016

Microsoft Word®, Excel®, PowerPoint® and Visio® are registered trademarks of Microsoft Corporation.
Autocad© is copyright of Autodesk, Inc.
SDL Trados® (including SDL Trados Studio, Trados Workbench, TagEditor and Microsoft Word Addin) is a registered trademark of SDL plc.
memoQ is copyright of Kilgray Translation Technologies.
Wordfast© is copyright of Yves Champollion.

Software disclaimer