What is the reason for the existence of multiple possible translations in the example above?
Before we analyze this question, we can first analyze the way translation in fixed-length encoding works. We can describe the path of the translation with a tree, as shown in →Fig. 1.76. If the receiving end receives 00011011, then it can start from the root, and retrieve encodings one by one from the text to be translated. If the encoding is “0,” then it goes left; if the encoding is “1,” it goes right. Whenever it reaches a leaf, it translates to a character. It repeats the above process until the translation ends. It thus obtains ABCD in the end.
Translation tree for fixed-length encoding.
The problem of code translation is a search problem on string, that is, multipattern matching on string.
Let us see the translation scenario for variable-length encoding plan 1 in →Fig. 1.77. The encoding of A is 0 and the encoding of B is 00. The beginning parts of encodings for A and B are identical. Therefore, when B is encountered, the receiver cannot distinguish whether its first bit represents A or is a prefix for B.
Translation tree for variable-length encoding plan 1.
After comparing →Figs. 1.76 and →1.77, we can see that, in order for the translation to be unique, then there cannot be total overlap between each path that arrives at a character node, that is, a character node cannot appear at a branching node.
Therefore, when designing a variable-length encoding, we must do it so that the encoding for any character cannot be the prefix for the encoding of another character, that is, prefixes must differ. Only in this way can we ensure the uniqueness of translations. Such encoding is called differentprefix encoding, and also called different beginning code.