Base64 and Uuencoding fundamentals
and Base64 encoding are the two most common ways of converting binary files
(such as executables, wordprocessor documents, multimedia files etc.) into
a format that can be sent safely via email and other transmission mechanisms
(e.g. usenet news, uucp).
Both Uuencoding and Base64 encoding
use a similar technique to convert non-printable binary values in the range
0-255 to a set of "safe" ASCII characters that
can be easily handled by most computer systems.
This example shows how a small binary
data stream is converted into printable characters by the Uuencoding mechanism.
The data stream consists of three bytes
containing the values 157, 97 and 226 (i.e. 10011101, 00110001 and 11100010)
Only one of these bytes (97) represents
a printable ASCII character (lowercase "a"). The other two are eight bits
long and would not transmit over a 7-bit link in any case.
The first step is to split the three
8-bit bytes into four 6-bit words (i.e. 100111, 010011, 000111 and 100010).
This will give us a number range of 0 to 63.
In order to map the data to the printable
ASCII characters, 32 is added to the value to give a value in the range
32 to 95 (i.e. ASCII characters from SPACE to UNDERSCORE).
In the example above this gives us 71,
51, 39 and 66 (0100111, 00110011, 01000111 and 01000010) which maps to
and B. These are all "safe" to include in the body text of an email
Finally, a "begin" and "end" line are
added and some additional formatting is included to indicate line length
and data checksums.
Why is it necessary?
Historically speaking, different computer
systems often have different and incompatible ways of transmitting text
and binary files. Not all computer systems communicate with all 8 bits
of the byte, some use only seven bits and the most significant bit for
parity checking, which reduces the range to 128 possible values.
The first 32 values are not printable,
indeed some such as XOFF would have catastrophic effects during data transfer
as they are are used to control the flow of data from the host (XOFF turns
the remote transmission off). This reduces it to 96 characters, but then
DEL is not printable either. Worse still, some computer systems do not
support lowercase characters, so Uuencoding concentrates on just 64 characters
that are widely compatible.
Even then, some computer systems
(such as those running EBCDIC instead of ASCII) cannot properly substitute
some of the characters, so a variation of Uuencoding, called XXencoding
exists for this purpose.