1. Introduction
Speaking of I/O, everyone is familiar with it. The full English name of I/O: Input/Output, which is input/output , usually refers to the input and output of data between internal memory and external memory or other peripheral devices .
For example, our commonly used SD card , U disk , mobile hard disk and other hardware devices that store files, when we plug it into the USB hardware interface of the computer, we can read the information in the device from the computer Or write information, this process involves I/O operations.
Of course, operations involving I/O are not limited to the reading and writing of hardware devices, but also the transmission of network data. For example, we use a browser to search for information on the Internet on a computer. This process also involves I/O. operate.
Whether it is reading and writing files from the disk or transferring data over the network, it can be said that I/O is mainly human-computer interaction , machine-computer interaction to obtain and exchange information.
In the Java IO system, there are nearly 80 classes, located in the java.io
package, which feels very complicated, but these classes can be roughly divided into four groups:
- I/O interface based on byte operation: InputStream and OutputStream
- I/O interface based on character operation: Writer and Reader
- I/O interface based on disk operation: File
- I/O interface based on network operation: Socket
The first two groups mainly data format of the transmission data different grouping; the latter two groups mainly way data transmission different group.
Although the Socket class is not in the java.io
package, we still divide them together. Because of the core problem of I/O, either the data format affects the I/O operation, or the transmission mode affects the I/O operation. is also The question of what kind of data is written and where is , I/O is just a means of interaction between humans and machines or machines and machines, in addition to their ability to complete this interactive function, we are concerned about how to improve its operating efficiency, and data format and transmission mode is the most critical factor affecting efficiency.
Later in this article, an in-depth analysis is also based on these two points.
2. Byte-based interface
The byte-based input and output operation interfaces are: InputStream and OutputStream.
2.1, byte input stream
The class inheritance hierarchy of InputStream input stream is shown in the following figure:
The input stream can be divided into several sub-categories according to the data node type and processing method, as shown in the following figure:
The class hierarchy of OutputStream output stream is similar.
2.2, byte output stream
The class inheritance hierarchy of OutputStream output stream is shown in the following figure:
The output stream can also be divided into several sub-categories according to the data node type and processing method, as shown in the following figure:
I will not introduce the usage of each subclass in detail here. If you are interested, you can check the API documentation of the JDK. The author will also give a detailed introduction in a later article. I just want to talk about it. whether it is input It is still output. The methods of operating data can be used in combination. Each processing stream class does not only operate on the fixed node stream . For example, the following output methods:
//将文件输出流包装到序列化输出流中,再将序列化输出流包装到缓冲中
OutputStream out = new BufferedOutputStream(new ObjectOutputStream(new FileOutputStream(new File("fileName")));
In addition, output stream eventually wrote what must specify , either written to the hard disk, either write to the network can be found from the figure, the network actually write file write, just write to the network, The data needs to be sent to other computers through the underlying operating system instead of being written to the local hard disk.
3. Interface based on character operation
whether it is a disk or network transmission, the smallest storage unit is bytes, not characters , so I/O operations are bytes instead of characters, but why is there an I/O interface for operating characters?
This is because the data usually manipulated in our programs is in the form of characters. provides an I/O interface for writing characters directly for the convenience of program operations, nothing more.
The character-based input and output operation interfaces are: Reader and Writer respectively. The following figure is the class structure diagram involved in the character I/O operation interface.
3.1, character input stream
The class inheritance hierarchy of Reader input stream is shown in the figure below:
Similarly, the input stream can be divided into several sub-categories according to the data node type and processing method, as shown in the following figure:
3.2, character output stream
The class inheritance hierarchy of Writer output stream is shown in the following figure:
Similarly, the output stream is classified according to the type of data node and the processing method, and several sub-categories can be divided, as shown in the following figure:
Regardless of the Reader or Writer classes, they only define the way to read or write data characters, that is to say, it is either read or write, but it does not specify where the data is to be written, and where it is written is after us. The disk or network-based working mechanism to be discussed.
Fourth, the conversion of bytes and characters
We just said that whether it is a disk or network transmission, the smallest storage unit is bytes, not characters. The reason for designing characters is to make the program operation more convenient. Then how to convert characters into bytes or convert bytes into What about characters?
InputStreamReader and OutputStreamWriter are conversion bridges.
4.1. Input stream conversion process
The conversion process of input stream character decoding related class structure is shown in the following figure:
As can be seen from the figure, the InputStreamReader class is a byte-to-character conversion bridge, where StreamDecoder
refers to a decoding operation class, and Charset
refers to a character set.
The process from InputStream to Reader needs to specify the coded character set, otherwise the operating system default character set will be used, which may cause garbled problems. StreamDecoder is an implementation class that completes byte-to-character decoding.
Open the source code part, InputStream to Reader conversion process
public class InputStreamReader extends Reader {
private final StreamDecoder sd;
/**
* Creates an InputStreamReader that uses the default charset.
*
* @param in An InputStream
*/
public InputStreamReader(InputStream in) {
super(in);
try {
sd = StreamDecoder.forInputStreamReader(in, this, (String)null); // ## check lock object
} catch (UnsupportedEncodingException e) {
// The default encoding should always be available
throw new Error(e);
}
}
4.2. Output stream conversion process
The output stream conversion process is similar, as shown in the following figure:
Byte characters to complete the encoding process by the class OutputStreamWriter by StreamEncoder
complete encoding process.
Source code part, Writer to OutputStream conversion process :
public class OutputStreamWriter extends Writer {
private final StreamEncoder se;
public OutputStreamWriter(OutputStream out) {
super(out);
try {
se = StreamEncoder.forOutputStreamWriter(out, this, (String)null);
} catch (UnsupportedEncodingException e) {
throw new Error(e);
}
}
Five, the interface based on disk operation
The operation interface of Java I/O was introduced above. These interfaces mainly define how to manipulate data, and introduce the way of operating data format: byte stream and character stream.
is where to write the data. One of the main processing methods is to persist the data to the physical disk.
We know that the only minimal description of data on the disk is the file, which means that upper-level applications can only manipulate the data on the disk through the file. The file is also the smallest unit of interaction between the operating system and the disk drive.
In the Java I/O system, the File class is the only object that represents the disk file itself.
The File class defines some platform-independent methods to manipulate files, including checking file exists, creating, deleting files, renaming files, judging whether the file's read and writing permissions exist, setting and querying the file’s last modification time etc. And so on.
It is worth noting that the usual File in Java does not represent a real file object. When you specify a path descriptor, it will return a virtual object that represents the path. This may be a A real file or a directory containing multiple files.
For example, to read the contents of a file, the procedure is as follows:
public static void main(String[] args) throws IOException {
StringBuffer sb = new StringBuffer();
char[] chars = new char[1024];
FileReader f = new FileReader("fileName");
while (f.read()>0){
sb.append(chars);
}
sb.toString();
}
Take the above program as an example, read a text character from the hard disk, the operation flow is as follows:
Let's take a look at the source code execution process.
When we pass in a specified file name to create a File object, and read the file content through FileReader, a FileInputStream
object will be automatically created to read the file content, which is the byte stream we mentioned above to read the file .
public class FileReader extends InputStreamReader {
/**
* Creates a new <tt>FileReader</tt>, given the name of the
* file to read from.
*
* @param fileName the name of the file to read from
* @exception FileNotFoundException if the named file does not exist,
* is a directory rather than a regular file,
* or for some other reason cannot be opened for
* reading.
*/
public FileReader(String fileName) throws FileNotFoundException {
super(new FileInputStream(fileName));
}
FileDescriptor
will be created, which actually represents a description of an existing file object. You can call the getFD()
FileInputStream
object to get the file description really associated with the underlying operating system.
public
class FileInputStream extends InputStream
{
/* 文件描述*/
private final FileDescriptor fd;
/* 文件路径 */
private final String path;
public FileInputStream(File file) throws FileNotFoundException {
String name = (file != null ? file.getPath() : null);
SecurityManager security = System.getSecurityManager();
if (security != null) {
security.checkRead(name);
}
if (name == null) {
throw new NullPointerException();
}
if (file.isInvalid()) {
throw new FileNotFoundException("Invalid file path");
}
fd = new FileDescriptor();
fd.attach(this);
path = name;
open(name);
}
Because we need to read a character format, it is necessary StreamDecoder
class byte
decoded as char
format, as to how to read a piece of data from the disk drive by the operating system to help us complete.
6. Interface based on network operation
Continue to talk about another way of processing where the data is written: writes the data to the Internet so that other computers can access .
6.1 Introduction to Socket
In reality, the concept of Socket does not have a concrete entity. It is an abstract definition that describes how computers communicate with each other.
For example, you can compare Socket as a means of transportation between two cities. With it, you can shuttle back and forth between cities. In addition, there are many types of transportation, and each type of transportation also has corresponding traffic rules. Socket is the same, there are many kinds. In most cases, we use stream sockets based on TCP/IP, which is a stable communication protocol.
A typical application scenario based on Socket communication, as shown below:
If the application of host A wants to communicate with the application of host B, a connection must be established through Socket, and the establishment of a Socket connection must require the underlying TCP/IP protocol to establish a TCP connection.
6.2. Establish a communication link
We know that the IP protocol used by the network layer can help us find the target host based on the IP address, but there may be multiple applications running on a host, how to communicate with the specified application is through the TCP or UPD address, which is Port number to specify. In this way, a Socket instance can represent the communication link of an application on only one host.
In order to deliver the data to the target accurately, the TCP protocol uses the three-way handshake strategy , as shown in the following figure:
Among them, SYN is called Synchronize Sequence Numbers. represents the synchronization sequence number , which is the handshake signal used when TCP/IP establishes a connection.
ACK is called Acknowledge character, which is confirmation character. means that the data sent has been confirmed to be received correctly .
When a normal TCP network connection is established between the client and the server, the client first sends a SYN message , the server uses SYN + ACK indicate that it has received this message, and finally the client responds ACK
In this way, a reliable TCP connection can be established between the client and the server, and data can be transferred between the client and the server.
The simple process is as follows:
- Sender – (Send a packet with the SYN flag) –> Receiver (first handshake);
- Receiving end-(send a data packet with SYN + ACK flag) -> Sending end (second handshake);
- Sender – (send a packet with ACK flag) –> Receiver (third handshake);
After completing the three-way handshake, the client application and the server application can begin to transmit data.
Data transmission is our main purpose of establishing a connection. How to transmit data through Socket?
6.3, transfer data
When the client wants to communicate with the server, the client must first create a Socket instance. The default operating system will assign an unused local port number to this Socket instance, and create a socket containing local and remote addresses and port numbers. Socket data structure, this data structure will be kept in the system until the connection is closed.
/**
* 客户端
*/
public class Client {
public static void main(String[] args) throws IOException {
Socket socket = new Socket("127.0.0.1", 9090);
//向服务端发送数据
PrintStream ps = new PrintStream(new BufferedOutputStream(socket.getOutputStream()));
//读取服务端返回的数据
BufferedReader br = new BufferedReader(new InputStreamReader(socket.getInputStream()));
ps.println("hello word!!");
ps.flush();
String info = br.readLine();
System.out.println(info);
ps.close();
br.close();
}
}
The corresponding server will also create a ServerSocket instance. ServerSocket creation is relatively simple. As long as the specified port number is not occupied, the general instance creation will succeed. At the same time, the operating system will also create an underlying data structure for the ServerSocket instance. The structure contains the designated listening port number and the wildcard that contains the listening address, usually *
, that is, all addresses are monitored.
Later, when the accept() method is called, it will enter the blocking state and wait for the client's request.
/**
* 服务端
*/
public class ServerTest {
public static void main(String[] args) throws IOException {
//初始化服务端端口9090
ServerSocket serverSocket = new ServerSocket(9090);
System.out.println("服务端已启动,端口号为9090...");
//开启循环监听
while (true) {
//等待客户端的连接
Socket accept = serverSocket.accept();
//将字节流转化为字符流,读取客户端发来的数据
BufferedReader br = new BufferedReader(new InputStreamReader(accept.getInputStream()));
//一行一行的读取客户端的数据
String s = br.readLine();
System.out.println("服务端收到客户端的信息:" + s);
}
}
}
We start the server program first, and then run the client. The server receives the information sent by the client, and the server prints the results as follows:
Note that the client will only send data after successfully establishing the three-way handshake with the server. The underlying operating system has already implemented the TCP/IP handshake process for us!
When the connection has been successfully established, both the server and the client will have a Socket instance. Each Socket instance has a InputStream and OutputStream . As we said earlier, the network I/O is transmitted in byte streams. Yes, Socket exchanges data through these two objects.
When the Socket object is created, the operating system will allocate buffers of a certain size for InputStream and OutputStream respectively, and data writing and reading are completed through this buffer area.
The writer writes the data to the SendQ queue corresponding to the OutputStream. When the queue is full, the data will be sent to the RecvQ queue of the InputStream at the other end. If the RecvQ is full at this time, then the write method of the OutputStream will block until The RecvQ queue has enough space to accommodate the data sent by SendQ.
It is worth noting that the size of the buffer area and the speed of the write end and the speed of the read end greatly affect the data transmission efficiency of this connection. Due to the possibility of congestion, network I/O and disk I/O are in the data writing There is also a coordinated process with reading. If both sides transmit data at the same time, a deadlock problem may occur.
how to improve the network IO transmission efficiency, and ensure reliable data transmission, the engineers have become an urgent problem.
6.4, IO working method
In the computer, there are three working modes for IO transmission data, namely BIO, NIO, and AIO .
In the next issue, we will analyze the characteristics and principles of these three IOs one by one.
Seven, summary
This article explains more content, starting from the Java basic I/O class library structure, it mainly introduces the IO's transmission format and transmission method , as well as the basic working methods of disk I/O and network I/O.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。