Characterizing Messages

Main Menu           Previous Topic                                                           Next Topic


Compression Example

Note: in order to run the simulation referred to in this slide, go here, to the Java Applet version. You will be directed to download the latest version of the Java plug-in.

To the left, we have constructed a model of communication. The information source outputs a peculiar sequence of A's and B's. The sequence is a string of blocks of length 3, each block chosen independently and with equal probability from { AAA, AAB, ABA, BAA }.

To compute the entropy of this source, we could model it as a finite-state automaton. Each state would correspond to an ordered double: the current position in the block of three combined with the identity of the triplet being produced. However, we'll take a shortcut and suppose that the source machine is stateless and outputs three symbols at a time. We then compute the entropy per three symbols to be:

[4]

Because of this, if we were to re-encode a typical message in a binary channel, the number of bits needed would be only 2/3 of the number of symbols the source outputs. We can accomplish this by encoding the position of the B in each three-symbol block in binary (and use 00 if there is no B). In this case, perfect compression is attainable. The communications system to the left demonstrates this method. When you press "Go" repeatedly, the message from the source will be compressed by the transmitter and then expanded back into the original message by the receiver. Each press of "Go" allows another block of three to be processed.

As in previous simulations, the size of each agent is proportional to the number of bits contained in the corresponding array. Note that the source agent and its array grow faster than the transmitter agent and the corresponding array. This reflects the compression of source's symbol sequences to more compact sequences transferred across the channel.

Another example of compression, provided by Shannon, is of an information source in which each symbol is independent, but a B is given only with low probability p. In this case, perfect compression is not attainable, but we can approach it in the limit p -> 0 by using a special sequence (say 0000) for the rare B, and encoding the number of A's in between in binary. This means that we have to skip binary numbers containing 0000 to avoid confusion. As messages grow long, and p shrinks, we can approach ideal compression in this way.

                   Previous Slide                                                           Next Slide