Base64 encoding is a binary-to-text encoding/decoding scheme.
Base64 encoding is used when any binary data needs to be transmitted over a media that is designed to handle only textual data. Many communication protocols like SMTP, NNTP were traditionally designed to work with plain text data represented by the 7-bit US-ASCII character set. To transfer non-ASCII or binary data over such communication channels, the binary data is encoded to the ASCII charset using Base64 encoding scheme.
The encoding process converts binary data to a printable ASCII string format. The decoding process converts the encoded string back to binary data.
Base64 encoding uses a subset of 65 characters from the US-ASCII charset. These characters are
=. The first 64 characters are represented using a 6-bit sequence (
26 = 64). These characters form the Base64 alphabet. The extra 65th character (
=) is used to pad the Base64 encoded output.
The following table displays the list of Base64 alphabets. Each alphabet is represented by a 6-bit sequence from 0 to 63 -
# The Base64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w (pad) = 15 P 32 g 49 x 16 Q 33 h 50 y
The Base64 encoding process receives input in the form of of 8-bit bytes. It processes the input from left to right and organizes the input into 24-bit groups by concatenating three 8-bit bytes. These 24-bit groups are then treated as 4 concatenated 6-bit groups. Finally, each 6-bit group is converted to a single character in the Base64 alphabet by consulting the above Base64 alphabet table.
When the input has fewer than 24 bits at the end, zero bits are added (on the right) to form an integral number of 6-bit groups. Then, one or two pad (=) characters are output depending on the following cases -
- Input has 8 bits remaining at the end: Four zero bits are added to form two 6-bit groups. Each 6-bit group is converted to the resulting Base64 encoded character using the Base64 index table. After that two pad (=) characters are appended to the output.
- Input has 16 bits remaining at the end: Two zero bits are added to form three 6-bit groups. Each of the three 6-bit groups is converted to the corresponding Base64 alphabet. Finally a single pad (=) character is appended to the output.
The decoding process does the opposite of the above encoding process. Let’s look at an example to understand how Base64 encoding works:
Binary Representation of input (8-bit bytes):
01100001 01000000 01100010 01100011
Step 1: Organize the input into 24-bit groups (having four 6-bit groups each). Pad with zero bits at the end to form an integral no of 6-bit groups.
011000 010100 000001 100010 011000 110000 # (padded with four zeros at the end)
Step 2: Convert the 6-bit sequences to Base64 alphabets by indexing into the Base64 index table. Add pad character if zero bits are added at the end of the input.
The above 6-bit groups equate to the following indexes:
24 20 1 34 24 48
Indexing into the Base64 alphabet table gives the following output:
YUBiYw== # (padded with two `=` characters)