5
头图

Typical Case

1. After the Socket is disconnected, it will receive a signal of type SIGPIPE. If it is not processed, it will crash

My colleague asked me a question and said that I received a crash message and went to the mpaas platform to see the following crash message

2021-04-06-NetworkFatlError.png

After reading the code, it shows 313 lines of code in a certain file, the code is as follows

2021-04-06-NetworkFatlError.png

Socket belongs to the bottom-level implementation of the network. Generally, we do not need to use it for development, but we need to be careful when using it, such as Hook network layer, long link, etc. Looking at the official documentation will say to see some instructions.

When using socket for network connection, if the connection is interrupted, by default, the process will receive a SIGPIPE signal. If you don't handle this signal, the app will crash.

Mach already provides low-level trap handling through exception mechanisms, and BSD builds signal handling mechanisms on top of exception mechanisms. The signals generated by the hardware are captured by the Mach layer and then converted into corresponding UNIX signals. In order to maintain a unified mechanism, the signals generated by the operating system and users are first converted to Mach exceptions, and then converted to signals.

Mach exceptions are converted into corresponding unix signals by ux_exception at the host layer, and the signals are delivered to the error thread through threadsignal .

Mach 异常处理以及转换为 Unix 信号的流程

There are 2 solutions:

  • Ignore the signal globally with the following line of code. (Ignore the signal globally. The downside is that all SIGPIPE signals will be ignored)

    signal(SIGPIPE, SIG_IGN);
  • Tell the socket not to send the signal in the first place with the following lines of code (substituting the variable containing your socket in place of sock ) (tell the socket not to send the signal: SO_NOSIGPIPE)

    int value = 1;
    setsockopt(sock, SOL_SOCKET, SO_NOSIGPIPE, &value, sizeof(value));

SO_NOSIGPIPE is a macro definition, skip to see the implementation

#define SO_NOSIGPIPE  0x1022     /* APPLE: No SIGPIPE on EPIPE */

What does that mean? There is no SIGPIPE signal in EPIPE. That is EPIPE .

Where: EPIPE is one of the possible error codes returned by the socket send function. If data is sent, RST will be triggered on the client side (referring to the situation where the connection has been destroyed after the client's FIN_WAIT_2 state times out), causing the send operation to return EPIPE (errno 32) error and triggering SIGPIPE signal (the default behavior is Terminate ).

What happens if the client ignores the error return from readline and writes more data to the server? This can happen, for example, if the client needs to perform two writes to the server before reading anything back, with the first write eliciting the RST.

The rule that applies is: When a process writes to a socket that has received an RST, the SIGPIPE signal is sent to the process. The default action of this signal is to terminate the process, so the process must catch the signal to avoid being involuntarily terminated.

If the process either catches the signal and returns from the signal handler, or ignores the signal, the write operation returns EPIPE.

UNP (unix network program) recommends that the application process SIGPIPE signal as needed, at least not to use the system's default processing method to handle this signal. The system's default processing method is to exit the process, so it is difficult for your application to find out why the processing process exited. Those interested in UNP can check: http://www.unpbook.com/unpv13e.tar.gz .

Here are 2 official Apple documents describing socket and SIGPIPE signals, and best practices:

Avoiding Common Networking Mistakes

Using Sockets and Socket Streams

But there is still a Crash in the online code. I checked the code and found that the crash stack is 062425a20a961a in sendPingWithData . That is, although the SIGPIPE signal is ignored in AppDelegate, it will still be "reset" in some functions.

- (void)sendPingWithData:(NSData *)data {
    int                     err;
    NSData *                payload;
    NSData *                packet;
    ssize_t                 bytesSent;
    id<PingFoundationDelegate>  strongDelegate;
    // ...
    // Send the packet.
    if (self.socket == NULL) {
        bytesSent = -1;
        err = EBADF;
    } else if (!CFSocketIsValid(self.socket)) {
        //Returns a Boolean value that indicates whether a CFSocket object is valid and able to send or receive messages.
        bytesSent = -1;
        err = EPIPE;
    } else {
        [self ignoreSIGPIPE];
        bytesSent = sendto(
                           CFSocketGetNative(self.socket),
                           packet.bytes,
                           packet.length,
                           SO_NOSIGPIPE,
                           self.hostAddress.bytes,
                           (socklen_t) self.hostAddress.length
                           );
        err = 0;
        if (bytesSent < 0) {
            err = errno;
        }
    }
    // ...
}

- (void)ignoreSIGPIPE {
    int value = 1;
    setsockopt(CFSocketGetNative(self.socket), SOL_SOCKET, SO_NOSIGPIPE, &value, sizeof(value));
}

- (void)dealloc {
    [self stop];
}

- (void)stop {
    [self stopHostResolution];
    [self stopSocket];

    // Junk the host address on stop.  If the client calls -start again, we'll 
    // re-resolve the host name.
    self.hostAddress = NULL;
}

That is to say, when calling sendto() , you need to judge, and call CFSocketIsValid to judge the quality of the current channel. This function returns whether the current Socket object is valid and can send or receive messages. Of
The previous judgment is that when the self.socket object is not NULL, the message is sent directly. But there is a situation that the Socket object is not empty, but the channel is not available, then it will crash.

Returns a Boolean value that indicates whether a CFSocket object is valid and able to send or receive messages.
if (self.socket == NULL) {
    bytesSent = -1;
    err = EBADF;
} else {
    [self ignoreSIGPIPE];
    bytesSent = sendto(
                        CFSocketGetNative(self.socket),
                        packet.bytes,
                        packet.length,
                        SO_NOSIGPIPE,
                        self.hostAddress.bytes,
                        (socklen_t) self.hostAddress.length
                        );
    err = 0;
    if (bytesSent < 0) {
        err = errno;
    }
}   

2. The device has no free space problem

设备无可用空间问题
The first time I encountered this problem, the intuitive judgment is that the server machine where an interface is located has a storage problem (because the code will be called when there is an Error in the network callback), because it is not stable and must be present, so it is also Didn't pay much attention. It was not until later that I found that online merchants reported that this problem has often appeared recently. After troubleshooting the problem, the problem Error Domain=NSPOSIXErrorDomain Code=28 "No space left on device" was reported by the system. After opening the Instruments Network panel, I saw that there were too many sessions displayed. In order to reproduce the problem, the timer triggers the "cut shop" logic, and the shop cut will trigger various network requests required by the home page, and the problem can be reproduced. Find the code created by NSURLSession in the project, and locate the problem to some underlying libraries and HOOK network monitoring capabilities. One is APM network monitoring, which confirms that APMM network monitoring session creation is convergent, and the other library is a dynamic domain name replacement library, which has experienced online failures before. So after thinking about it, temporarily release the hotfix code for this library. Previously, a "pessimistic strategy" was adopted, with a 99% probability that there would be no failure, and then the performance of each network on the line was sacrificed to add a process, and there were still problems in the implementation of this process. After thinking about it, adopt an optimistic strategy, assuming that there is a high probability that there will be no failures online, and reserve two methods. If there is a fault on the line, a hot repair will be released immediately, and the following method will be called.

+ (BOOL)canInitWithRequest:(NSURLRequest *)request {
    return NO;
}

//下面代码保留着,以防热修复使用
+ (BOOL)open_canInitWithRequest:(NSURLRequest *)request {
    // 代理网络请求
} 

After the problem is temporarily resolved, the library for subsequent dynamic domain name replacement can refer to the implementation of WeexSDK. See WXResourceRequestHandlerDefaultImpl.m . This code implementation of WeexSDK takes into account the problem of multiple network monitoring objects and the problem of creating multiple sessions, which is a reasonable interpretation.

- (void)sendRequest:(WXResourceRequest *)request withDelegate:(id<WXResourceRequestDelegate>)delegate
{
    if (!_session) {
        NSURLSessionConfiguration *urlSessionConfig = [NSURLSessionConfiguration defaultSessionConfiguration];
        if ([WXAppConfiguration customizeProtocolClasses].count > 0) {
            NSArray *defaultProtocols = urlSessionConfig.protocolClasses;
            urlSessionConfig.protocolClasses = [[WXAppConfiguration customizeProtocolClasses] arrayByAddingObjectsFromArray:defaultProtocols];
        }
        _session = [NSURLSession sessionWithConfiguration:urlSessionConfig
                                                 delegate:self
                                            delegateQueue:[NSOperationQueue mainQueue]];
        _delegates = [WXThreadSafeMutableDictionary new];
    }
    
    NSURLSessionDataTask *task = [_session dataTaskWithRequest:request];
    request.taskIdentifier = task;
    [_delegates setObject:delegate forKey:task];
    [task resume];
}

杭城小刘
1.1k 声望5.1k 粉丝

95 - iOS - 养了4只布偶猫