1. What is Http Client
Http protocol, the Internet is full of common language, and Http Client , can be said that we need to get the most basic methods of data from the Internet world, it essentially is a the URL of to a page conversion process. With the basic Http client function, coupled with the rules and strategies we want, everything from content retrieval to data analysis can be achieved.
Following the last introduction, Workflow can 10 lines of C++ code to implement a high-performance Http server . It is also very simple to continue to implement a high-performance Http client in C++ today!
// [http_client.cc]
#include "stdio.h"
#include "workflow/HttpMessage.h"
#include "workflow/WFTaskFactory.h"
int main (int argc, char *argv[])
{
const char *url = "https://github.com/sogou/workflow";
WFHttpTask *task = WFTaskFactory::create_http_task (url, 2, 3,
[](WFHttpTask * task) {
fprintf(stderr, "%s %s %s\r\n",
task->get_resp()->get_http_version(),
task->get_resp()->get_status_code(),
task->get_resp()->get_reason_phrase());
});
task->start();
getchar(); // press "Enter" to end.
return 0;
}
As long as Workflow is installed, the above code can be compiled into a simple http_client with the following command:
g++ -o http_client http_client.cc --std=c++11 -lworkflow -lssl -lcrypto -lpthread
According to the Http protocol, if we execute this executable program ./http_client
, we will get the following:
HTTP/1.1 200 OK
In the same way, we can also get other Http headers and Http bodies returned through other APIs, and everything is in this WFHttpTask
. And because Workflow is an asynchronous scheduling framework, after this task is issued, it will not block the current thread, plus the internal connection multiplexing, which fundamentally guarantees the high performance Http Client
Next, I will explain the principle in detail~
2. The request process
1. Create Http task
As you can see from the above demo, the request is realized by initiating an Workflow . The interface for creating a task is as follows:
WFHttpTask *create_http_task(const std::string& url,
int redirect_max, int retry_max,
http_callback_t callback);
The first parameter is the URL we want to request. Correspondingly, in the initial example, our redirect number redirect_max is 2 times, and the number of 1612606a0b5dc5 is 3 times. The fourth parameter is a callback function. In the example, we used a lambda. Since Workflow are asynchronous, we will passively notify us of the result of processing, and the callback function will be called when the result comes back. , The format is as follows:
using http_callback_t = std::function<void (WFHttpTask *)>;
2. Fill in the header and send it out
Our network interaction is nothing more than request-reply , which corresponds to Http Client . After we create the task, we have some opportunities to process the request in the header protocol, which is filled in the header of the Http Client 1612606a0b5e3d. For protocol-related things, for example, we can specify the long connection that we want to establish Http through Connection to save the time for establishing the connection next time, then we can set Connection to Keep-Alive . Examples are as follows:
protocol::HttpRequest *req = task->get_req();
req->add_header_pair("Connection", "Keep-Alive");
task->start();
Finally, we will set the requested task and send it task->start();
In the first http_client.cc
example, there is a getchar();
statement, because our asynchronous task is non-blocking after it is sent, and the current thread will exit without temporarily stopping, and we want to wait until the callback function comes back, so we can use a variety of pauses Way.
3. Processing the returned result
A returned result, according to the Http protocol, will contain three parts: message line , message header , message body . If we want to get the body, we can do this:
const void *body;
size_t body_len;
task->get_resp()->get_parsed_body(&body, &body_len);
Three, the basic guarantee of high performance
We use C++ to write Http Client . The most fragrant thing is that it can take advantage of its high performance. Workflow guarantee high concurrency? In fact, there are two points:
- Pure asynchronous
- Connection reuse;
The former is the thread resources , and the latter is the connection resources . These framework levels are managed for users and fully reduce the mental burden of developers.
1. Asynchronous scheduling mode
The synchronous and asynchronous modes directly determine Http Client can be. why? Through the following figure, you can first see what the thread model is like when the synchronization framework initiates three Http tasks:
The network delay is often very large, if we are waiting for the task to come back synchronously, the thread will always be occupied. At this time we need to see how the asynchronous framework is implemented:
As shown in the figure, as long as the task is sent, the thread can do other things. We pass in a callback function for asynchronous notification, so after the network reply of the task is received, let the thread execute this callback function. As a result of the Http request, when multiple tasks are issued during the period, the threads can be reused, easily reaching hundreds of thousands of QPS concurrency.
2. Connection reuse
As we mentioned earlier, as long as we establish a long connection, efficiency can be improved. why? Because the framework reuses connections. Let's first take a look at what will happen if a connection is established with one request:
Obviously, occupying a large number of connections is a waste of system resources, and it is very time-consuming to do connect and close every time. In addition to the TCP , the process of establishing connections for many application layer protocols will also be relatively complicated. . But using Workflow will not have such troubles, Workflow will automatically find the connection that can be reused when the task is sent out, if not, it will be automatically created, and there is no need for developers to care about the details of how the connection is reused. :
3. Unlock other functions
Of course, in addition to the above high performance, a high-performance Http Client often has many other requirements. Here you can share with you in combination with the actual situation:
- Combine the series and parallel task flow of workflow to realize the ultra-large-scale parallel capture ;
- requests the content of a certain site in order of or at a specified speed, so as to avoid excessive requests and be blocked;
- Http Client encounters redirect can automatically help me to do the jump, and request to the final result in one step;
- Hope to access
HTTP
andHTTPS
resources proxy
The above requirements require the framework to have ultra-high flexibility for the scheduling of Http tasks, as well as very grounded support for actual requirements (such as redirect, ssl proxy and other functions). These Workflow have been implemented.
project address
https://github.com/sogou/workflow
Welcome to use workflow and star support it!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。