Best Practices - Writing DITA with Localization in Mind – Tips for using Index Terms

In the past, it has been common practice for writers using desktop publishing tools to insert index tags anywhere in the content, by selecting a word and then applying an index marker on that word.

The process for creating index terms in DITA is handled differently than in traditional desktop publishing applications. When a word or phrase is enclosed in the <indexterm> tag in DITA, it is no longer recognized as part of the actual content and will only be used in the index at the back of the book. In other words, if you tag an existing word or phrase with the element <indexterm> inside a sentence, the word will no longer appear in the final text of the block. In DITA, you need to duplicate the word inside the index term.

<title>
  <indexterm>document<indexterm>printing</indexterm></indexterm>To print a document
</title>

Tip: Insert index entries that refer to entire topics in the prolog element

If topics are well designed, only the topic itself needs to be indexed. Put topic-level index entries inside the prolog of the topic, which is the area for storing metadata related to a topic. In this way, the index terms are separated from the content of the topic and will not break the grammatical structure of the sentence. Written this way, the index terms are easier for translators to find.

<task>
<title>To print a document</title>
<prolog>
<metadata>
<keywords>
<indexterm>document<indexterm>printing</indexterm></indexterm>
<indexterm>printing document</indexterm>
</keywords>
</metadata>
</prolog>

</task> 

Tip: Try to avoid block-level, sentence-level, word-level or phrase-level index elements

If you absolutely need to use index terms to index a specific block, sentence, word or phrase, make sure you insert the index terms immediately in front of the text to which it applies. If you insert an index term in the middle of the sentence, you can disrupt the meaning of the sentence making it difficult for translators to understand the sentence. Remember that index terms will not appear in the final block of text that is published.

Instead of writing:

<p>Hello <indexterm>Examples</indexterm>World</p>

Write:

<p><indexterm>Examples</indexterm>Hello World</p>

Although this approach is slightly better that the previous one, it is not ideal for a translator to understand and translate. The best practice is to put index terms in the prolog whenever possible.

Tip: Use <index-sort-as> element

To influence how software sorts your back-of-book index, it is recommended that you use the <indexsort-as> element. This element specifies how to sort an index entry for a particular language. In English, this is mainly done to disregard insignificant leading text such as punctuation (., <, >, etc) or to discard words like “the” or “a.” For example, if you want to sort the word “.NET” under the letter N rather than the dot (.), you would use the following DITA syntax:

<indexterm>.NET
  <index-sort-as>NET</index-sort-as>
<indexterm> 

For some languages such as Japanese, the index cannot be easily sorted automatically, and human intervention is needed. In Japanese, approximately 2000 Kanji characters and 51 Katakana characters (alphabet characters) cannot be used for sorting. This is not unique to Japanese alone, as the sort order also needs to be modified for Adzerbadjiani, Albanian, Caucasian, Croatian, Georgian, Mongolian, Serbian and several other languages.

<indexterm>林檎
<index-sort-as>リンゴ</index-sort-as>
</indexterm>
<indexterm>葡萄
<index-sort-as>ブドウ</index-sort-as>
</indexterm>
<indexterm>梨
<index-sort-as>ナシ</index-sort-as>
<indexterm> 

The difficulty here is that the <index-sort-as> element will not be available in the source. Since translators are not usually experts in DITA, it is not reasonable to expect them to insert additional tags in the target files. For this reason, it is important to add <index-sort-as> elements for all index terms in the file sent for translation to the languages that present this challenge. Some advanced Component Content Management and Translation Management systems can add these tags on the fly and eliminate the need for them to be added manually by writers.

<indexterm>Apple
<index-sort-as>Apple</index-sort-as>
</indexterm>
<indexterm>Grape
<index-sort-as>Grape</index-sort-as>
</indexterm>
<indexterm>Pear
<index-sort-as>Pear</index-sort-as>
<indexterm>

The translator can provide the right value for the <index-sort-as> element when he or she translates the topic.

Advanced translation management or component content management systems such as SDL Knowledge Center and SDL TMS can automate this process for you.