Introduction

What is Base64 encoding? Before answering this question, we need to understand the classification of files in computers. For computers, files can be divided into two categories, one is text files, and the other is binary files.

For binary files, the content is represented in binary, which is not immediately understandable to humans. If you try to open the binary with a text editor, you may see garbled characters. This is because the encoding of binary files is different from the encoding of text files, so when a text editor tries to translate binary files into text content, garbled characters will appear.

For text files, there are also many encoding methods, such as the earliest ASCII encoding and the currently commonly used encoding methods such as UTF-8 and UTF-16. Even text files may see garbled characters if you open them with a different encoding.

Therefore, whether it is a text file or a binary file, it is necessary to unify the encoding format. That is to say, what the encoding of the write looks like, then the encoding of the data read should also match it.

Base64 encoding is actually an encoding method for encoding binary data into visual ASCII characters.

Why is there such a requirement?

We know that the development of the computer world is not achieved overnight, it is a process of slow growth. For character encoding, only ASCII encoding was supported at the earliest, and later it was extended to Unicode and so on. So for many applications, other encoding formats other than ASCII encoding are not supported, so how to display non-ASCII code in these systems?

The solution is to perform encoding mapping and map non-ASCII characters to ASCII characters. And base64 is such an encoding method.

The common place to use Base64 is in web pages. Sometimes we need to display pictures in web pages, so we can base64 encode the pictures and then fill them into html.

Another application is to base64 encode the file and send it as an email attachment.

JAVA support for base64

Since base64 encoding is so easy to use, let's take a look at the base64 implementation in JAVA.

There is a corresponding base64 implementation in java, called java.util.Base64. This class is a tool class of Base64, which was introduced by JDK in version 1.8.

Base64 provides three getEncoder and getDecoder methods. By obtaining the corresponding Encoder and Decoder, you can call the encode and decode methods of the Encoder to encode and decode the data, which is very convenient.

Let's first look at the basic usage example of Base64:

  // 使用encoder进行编码
 String encodedString = Base64.getEncoder().encodeToString("what is your name baby?".getBytes("utf-8"));
 System.out.println("Base64编码过后的字符串 :" + encodedString);

 // 使用encoder进行解码
 byte[] decodedBytes = Base64.getDecoder().decode(encodedString);

 System.out.println("解码过后的字符串: " + new String(decodedBytes, "utf-8"));

As a tool class, the Base64 tool class provided in the JDK is still very useful.

I will not explain its use in detail here. This article mainly analyzes how Base64 is implemented in JDK.

Classification and Implementation of Base64 in JDK

The Base64 class in JDK provides three encoder methods, namely getEncoder, getUrlEncoder and getMimeEncoder:

     public static Encoder getEncoder() {
         return Encoder.RFC4648;
    }

    public static Encoder getUrlEncoder() {
         return Encoder.RFC4648_URLSAFE;
    }

    public static Encoder getMimeEncoder() {
        return Encoder.RFC2045;
    }

Similarly, it also provides three corresponding decoders, namely getDecoder, getUrlDecoder, and getMimeDecoder:

     public static Decoder getDecoder() {
         return Decoder.RFC4648;
    }

    public static Decoder getUrlDecoder() {
         return Decoder.RFC4648_URLSAFE;
    }

    public static Decoder getMimeDecoder() {
         return Decoder.RFC2045;
    }

As can be seen from the code, these three encodings correspond to RFC4648, RFC4648_URLSAFE and RFC2045 respectively.

These three are variants of base64 encoding, let's see what the difference is:

encoding name coded character coded character coded character
62nd 63rd Completion
RFC 2045: Base64 transfer encoding for MIME + / = mandatory
RFC 4648: base64 (standard) + / = optional
RFC 4648: base64url (URL- and filename-safe standard) - _ = optional

It can be seen that the difference between base64 and Base64url is that the 62nd and 63rd coded characters are different, and the difference between base64 for MIME and base64 is whether the completion character is mandatory.

In addition, for Basic and base64url, no line separator characters will be added, while base64 for MIME will add '\r' and '\n' as line separators after a line exceeds 76 characters.

Finally, if during the decoding process, it is found that characters that do not exist in the Base64 mapping table are handled differently, base64 and Base64url will be rejected directly, while base64 for MIME will be ignored.

The difference between base64 and Base64url can be seen by the following two methods:

         private static final char[] toBase64 = {
            'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
            'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
            'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
            '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/'
        };
         private static final char[] toBase64URL = {
            'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
            'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
            'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
            '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'
        };

For MIME, the maximum number of characters in a line, and newlines are defined:

         private static final int MIMELINEMAX = 76;
        private static final byte[] CRLF = new byte[] {'\r', '\n'};

Advanced usage of Base64

Under normal circumstances, the length of the object encoded by Base64 is fixed. We only need to convert the input object into a byte array to call the encode or decode method.

But in some cases, we need to convert the stream data. At this time, we can use the two methods of wrapping the Stream provided in Base64:

         public OutputStream wrap(OutputStream os) {
            Objects.requireNonNull(os);
            return new EncOutputStream(os, isURL ? toBase64URL : toBase64,
                                       newline, linemax, doPadding);
        }
         public InputStream wrap(InputStream is) {
            Objects.requireNonNull(is);
            return new DecInputStream(is, isURL ? fromBase64URL : fromBase64, isMIME);
        }

These two methods correspond to encoder and decoder respectively.

Summarize

The above is the implementation and use of Base64 in JDK. Although there are many variants of base64, Base64 in JDK only implements the three most widely used ones. When you use it, you must distinguish the specific implementation of Base64 to avoid problems.

This article has been included in http://www.flydean.com/14-1-1-java-base64/

The most popular interpretation, the most profound dry goods, the most concise tutorials, and many tricks you don't know are waiting for you to discover!

Welcome to pay attention to my official account: "Program those things", understand technology, understand you better!


flydean
890 声望433 粉丝

欢迎访问我的个人网站:www.flydean.com