As translators, we deal with written text all the time. In the digital world, text is represented using the Unicode standard which encodes the individual graphemes of all written languages. The Unicode standard can be considered the Bible for anyone dealing with electronic text. The latest version of the Unicode standard, version 7, includes over 113,000 characters (below, I will use the word “symbol” to mean any Unicode character). And since the job of the translator is to transpose text from one language to another, we must use the proper Unicode symbols to do that.
In this article, you will learn how to identify any Unicode symbol that you encounter in electronic documents in order to decide whether it should be retained, removed or replaced in the translation. This can help you in a number of situations:
- You encounter a strange-looking symbol and you are wondering whether it should be removed from translated text, replaced with another symbol or retained without changes.
- The spell-checker underlines a particular word that looks correct – this can mean that the word includes an incorrect Unicode symbol that looks similar to the intended symbol, but has a different meaning, such as Cyrillic “р” and Latin “p”.
In this article, we will use an example of some text I recently received from a user of TransTools. Here it is:
Is this a hyphen or something else? There are several ways to find this out, and they are detailed below.
Solution 1 – Use a web service
Andrew West, the author of BabelMap, the software which will be covered below, created a convenient web service that allows anyone to determine which Unicode symbols they are dealing with. This web service is located at the following address: http://www.babelstone.co.uk/Unicode/whatisit.html
Pasting our text into the input box on this web page, we produce the following results:
From these results, we can understand that the strange symbol is a Not Sign.
Solution 2 – Use Microsoft Word
If you use Microsoft Word, you can use a well-hidden trick: paste the text into Microsoft Word (if it comes from another application), select the symbol you want to identify, and open Insert Symbol dialogue. To do this, switch to Insert tab, click Symbol dropdown menu, and choose More Symbols... as shown below:
Note: if you use Microsoft Word 2003, click Insert -> Symbol menu.
When you do this, you will see Insert Symbol dialogue which will tell you what the symbol is:
Again, you can see that you are dealing with a Not Sign.
Solution 3 – Use BabelMap, a free advanced character map utility
BabelMap is a free Microsoft Windows character map utility created by Andrew West. It can be downloaded from this page: http://www.babelstone.co.uk/Software/BabelMap.html (look for BabelMap.zip link). This is a very powerful tool that has a lot of uses, one of which is symbol identification. After you download and unpack BabelMap.zip to a folder of your choice, do the following:
- Open BabelMap.
- Copy and paste the text that includes the strange symbol into the Edit Buffer box at the bottom of the dialogue.
- Click to the left of the symbol you want to identify so that the insertion point (caret) is immediately before it.
- Press F2 and the symbol will be highlighted in the character grid, with information about the symbol shown below the grid.
BabelMap has a lot of other applications, for example:
- you can use it to find a symbol you want to insert into another program (click on the symbol in the grid, select it in the Edit Buffer box and copy it to the clipboard by pressing Ctrl+C or right-clicking and choosing Copy)
- you can bookmark commonly used symbols using Bookmarks -> Add bookmark menu, and quickly find them later using the Bookmarks menu
- you can learn a lot of details about the symbol by clicking ? button below and to the right of the grid
Solution 4 – Use What Is This Symbol tool included in TransTools for Word
What Is This Symbol tool is a special tool for Microsoft Word that helps you obtain information about the Unicode symbols used in the selected text. It provides this information in several formats: as a Unicode code point, as a hexadecimal code, and as links to online resources and Word's internal utilities. It can also help you identify a symbol included in so-called Symbol fonts, i.e. non-Unicode symbols that are defined in very specialized fonts.
As you can see, there is a variety of solutions that help you identify a text symbol. Coming back to our example, all the solutions above gave us the answer: the strange symbol is not a hyphen but a Not Sign, a mathematical symbol that only looks similar to a hyphen but cannot be used in the middle of words. In our case, before we start translating the text, the best solution is to remove this symbol from the document by replacing it with an empty string (this can be done automatically using Find & Replace command in Word).
In conclusion, I would like to recommend you to bookmark and review the list of some commonly used Unicode symbols available on Wikipedia: http://en.wikibooks.org/wiki/Unicode/List_of_useful_symbols . This list only includes proper symbols and excludes letters, digits, some punctuation, etc. Of course, there is no need to remember all these symbols, but you need to be aware of them as you translate text into another language.