头图

image.png

lead

The story takes place in the summer of 2022 AD. In the online traffic test, God (pseudonym) found that the normal HTTP 200 request before the introduction of Istio became HTTP 400 after the introduction of Istio Gateway. The traffic in question has HTTP headers that do not conform to the HTTP specification. For example, there is an extra space before the colon:

 GET /headers HTTP/1.1\r\n
Host: httpbin.org\r\n
User-Agent: curl/7.68.0\r\n
Accept: */*\r\n
SpaceSuffixHeader : normalVal\r\n

After begging God to fix the problem, "innocent" programmers prepared for the worst, trying to build a Noah's Ark (Hebrew: יבת נח; English: Noah's Ark).

Plan - Two Noah's Arks

When people talk about Istio, people are mostly talking about Envoy. The HTTP 1.1 interpreter used by Envoy is a library nodejs/http-parser written in C language that has not been updated for 2 years. The most direct idea is to let the interpreter be compatible with the problem HTTP Header. Well, the programmer turned on the search engine.

Ark 1 - Make Interpreter Compatible

If choosing a search engine is a matter of conditions, then the selection of search keywords is a matter of technology + experience. I won't go into detail about how programmers search. In short, the result is taken by the engine: White spaces in header fields will cause parser failed #297

And then of course read with mixed feelings:

Set the HTTP_PARSER_STRICT=0 my issue, thanks.

That is, the above parameters need to be added during the compilation of istio-proxy / Envoy / http-parser to be compatible with Header names with spaces after them.

Since the factory where it is located is still a large factory and has its own infrastructure department, generally large factories will customize and compile open source projects instead of directly using binary Release. So the programmer tossed for a few days before customizing and compiling the istio-proxy of the company's infrastructure department, adding HTTP_PARSER_STRICT=0 . The test results did solve the compatibility problem.

But this workaround has several problems:

  • Recompilation is a reason for the infrastructure department not to support other problem solving later. It is easy to take the blame and introduce more unknown risks
  • There is an original principle of problem solving, which is to control the impact of the problem itself and the risk of the solution itself. Avoid introducing n bugs to fix one bug.

    • If the Istio Gateway allows the problem header to be transparently transmitted, then the sidecar proxy and application services of the subsequent layers should also be compatible and transparently transmit the problem header. Risk unknown.

Ark 2 - Fixed issuesHeader

Envoy claims to be a programmable Proxy. Many people know that various functions can be achieved by adding custom-developed HTTP Filter to it, which of course includes the customization and rewriting of HTTP Header.

But, please think carefully. If you have carefully read the "Reverse Engineering and Cloud Native Field Analysis Part2 - eBPF Tracking Istio/Envoy Startup, Monitoring and Thread Load Balancing" or the original Envoy author Matt Klein, Lyft's [Envoy Internals Deep Dive - Matt Klein, Lyft (Advanced Skill Level)]:

image.png

Explain that the error occurred in HTTP Codec, before HTTP Filter! So you can't use HTTP Filter.

In order to verify this problem, I breakpoint the http_parser_execute function of http-parser with gdb and see the stack. For the method of gdb, see "gdb debugging istio proxy (envoy)"

HTTP Filter does not work, so what about TCP Filter? In theory, of course, you can use TCP Filter to correct the problem Header before the Byte Buffer is sent to the HTTP Codec. Of course, instead of simply overwriting bytes, bytes may be deleted...

So another choice came. There are two ways to implement TCP Filter (hereinafter called Network Filter):

  • Native C++ Filter

    • Relatively good performance, no copy buffer is required. But to recompile Envoy.
  • WASM Filter

    • Due to the sandbox VM, it is necessary to copy the buffer between the VM and the native program, introducing cpu/memory usage and delay

As mentioned above, Envoy cannot be recompiled, and poor programmers can only choose WASM Filter.

If the "innocent" programmer is a pure architect, as long as he figured out the way and wrote a PPT architecture diagram, he can call it a day, then it is a Happy Ending. Unfortunately, "innocent" programmers are destined to spend several days without sleep for the construction of "Ark No. 2". The planks and needles have to come by hand...

WASM Network Filter Toddler

Choice of WASM language

There are several optional languages for writing WASM Filters. The trendy Rust, the Go that doesn't worry about finding a job, and the C++ of yesterday's yellow flowers. Whether it is for memory automation and security considerations, or for brushing resumes, C++ is the last choice. However, "innocent" programmers chose C++. In addition to the feelings that are worthless, there is another reason after deep consideration:

-- Reuse Envoy's ---18289d68118f87a981b379bb7f9cdbae--- with the same compile-time configuration of HTTP_PARSER_STRICT=0 http-parser compatibility mode turned on.

To fix a problematic HTTP header, first locate (or parse to) the header in the Byte Buffer. Of course a fancier interpreter could be used. All of the above languages have their own HTTP interpreters. But, who guarantees that the results of these interpreters are compatible with Envoy? Will new problems be introduced? Then, it is a good choice to use the same interpreter as Envoy directly. If there is a problem with the interpreter, even without this Fitler, Envoy itself will have problems. That is, it is basically guaranteed not to introduce new problems on the interpreter.

The niche WASM Network Filter

The luckiest programs can always find a copy/paste template code or God Issue workaround on the search engine/Stackoverflow/Github and easily complete the performance. And "unlucky" programmers are often to solve problems that have no standard answer (although I like the latter), and finally toss themselves without necessarily having performance.

Obviously, a bunch of WASM HTTP Filter materials and reference implementations can be found on the Internet, but there are very few WASM Network Filters. Some of them also read Buffer Bytes and do simple statistics. None of them modify the byte stream at the L3/4 layer, let alone interpret HTTP over the byte stream.

Proxy WASM C++ SDK

Open source not only opens up code, but also an opportunity for people to seek truth. "Unlucky" programmers remember the pain of learning Visual C++ MFC in 2002 and seeing only the documentation on MSDN without knowing why.

The niche WASM Network Filter, no matter how niche it is, is also Open Source. Not only the SDK Open Source, the interface definition ABI Spec is also Open Source. List the important references at hand:

WASM Network Filter Design

Stick to a habitual style, talk less, and post more pictures:

image.png

Figure: WASM Network Filter design diagram

Not much to say, here's the implementation.

WASM Network Filter implementation

Due to various reasons, it is not intended to copy all the code, the following is just a pseudo-code specially rewritten for this article to illustrate.

Since the source code of https://github.com/nodejs/http-parser is used, there are actually two files: http_parser.h and http_parser.c . Download and save to a new project directory first. Suppose it is called $REPAIRER_FILTER_HOME . The biggest advantage of this http-parser interpreter is that it has no dependencies and is easy to implement.

Now start writing the core code, which I assume is called: $repairer_fitler.cc

 #include ...
#include "proxy_wasm_intrinsics.h"
#include "http_parser.h" //from https://github.com/nodejs/http-parser 

/**
在每个 Filter 配置对应一个对象实例
**/
class ExampleRootContext : public RootContext
{
public:
  explicit ExampleRootContext(uint32_t id, std::string_view root_id) : RootContext(id, root_id) {}

    
  //Fitler 启动事件
  bool onStart(size_t) override
  {
    LOG_DEBUG("ready to process streams");
    return true;
  }
};

And then the core class:

 /**
在每个 downstream 连接对应一个对象实例
**/
class MainContext : public Context
{
public:
  http_parser_settings settings_;
  http_parser parser_;
  ...

  //构造函数,在每个新 downstream 连接可用时调用。如 TLS 握手后,或 Plain text 时的 TCP 连接后。注意, HTTP 1.1 是支持长连接的,即这个 object 需要支持多个 Request。
  explicit MainContext(uint32_t id, RootContext *root) : Context(id, root)
  {
    logInfo(std::string("new MainContext"));

    // http_parser_settings_init(&settings_);
    http_parser_init(&parser_, HTTP_REQUEST);
    parser_.data = this;
    //注册 HTTP Parser 的回调事件
    settings_ = {
        //on_message_begin:
        [](http_parser *parser) -> int
        {
          MainContext *hpContext = static_cast<MainContext *>(parser->data);
          return hpContext->on_message_begin();        
        },
        //on_header_field
        [](http_parser *parser, const char *at, size_t length) -> int
        {
          MainContext *hpContext = static_cast<MainContext *>(parser->data);
          return hpContext->on_header_field(at, length);
        },
        //on_header_value
        [](http_parser *parser, const char *at, size_t length) -> int
        {
          MainContext *hpContext = static_cast<MainContext *>(parser->data);
          return hpContext->on_header_value(at, length);
        },
        //on_headers_complete
        [](http_parser *parser) -> int
        {
          MainContext *hpContext = static_cast<MainContext *>(parser->data);
          return hpContext->on_headers_complete();
        },        
        ...
    }
  }
   
  //收到新 Buffer 事件,注意,一个 HTTP 请求由于网络原因,可以打散为多个 Buffer,回调多次。
  FilterStatus onDownstreamData(size_t length, bool end_of_stream) override
  {
    logInfo(std::string("onDownstreamData START"));      
    ...
        
    WasmDataPtr wasmDataPtr = getBufferBytes(WasmBufferType::NetworkDownstreamData, 0, length);

    {
      std::ostringstream out;
      out << "onDownstreamData length:" << length << ",end_of_stream:" << end_of_stream;
      logInfo(out.str());
      logInfo(std::string("onDownstreamData Buf:\n") + wasmDataPtr->toString());
    }

    //这里会执行各种 HTTP 解释,调用相关的 HTTP 解释回调函数。我们实现了这些函数,记录下问题 Header 的位置。并修正。
    size_t parsedBytes = http_parser_execute(&parser_, &settings_, wasmDataPtr->data(), length); // callbacks
    ...      
        
    // because Envoy drain `length` size of buf require start=0 :
    // see proxy-wasm-cpp-sdk proxy_wasm_api.h setBuffer()
    // see proxy-wasm-cpp-host src/exports.cc set_buffer_bytes()
    // see Envoy source/extensions/common/wasm/context.cc Buffer::copyFrom()
    size_t start = 0;
        
    // WasmResult setBuffer(WasmBufferType type, size_t start, size_t length, std::string_view data,
    //                           size_t *new_size = nullptr)
    // Ref. https://github.com/proxy-wasm/spec/tree/master/abi-versions/vNEXT#proxy_set_buffer
    // Set content of the buffer buffer_type to the bytes (buffer_data, buffer_size), replacing size bytes, starting at offset in the existing buffer.
    // setBuffer(WasmBufferType::NetworkDownstreamData, start, length, data);
    setBuffer(WasmBufferType::NetworkDownstreamData, start, length, outputBuffer);
  }
    
  /**
   * on HTTP Stream(Connection) closed
   */
  void onDone() override { logInfo("onDone " + std::to_string(id())); }

Last registration:

 static RegisterContextFactory register_ExampleContext(CONTEXT_FACTORY(MainContext),
                                                      ROOT_FACTORY(ExampleRootContext),
                                                      "my_root_id");

Due to the interpretation of Buffer, HTTP Request/Header cross-Buffer and other situations need to be considered. Also needs to support HTTP 1.1 keepalive persistent connections. Plus it's been 17 years since the last time I worked on a C++ project, and it took this programmer a week (overtime) to implement a working prototype. And, not optimized and tested for performance impact. The implementation of Sandbox VM is destined to have an impact on service latency. You can see my previous analysis:

Remember an Istio sprint tuning:

image.png

Figure: WASM in Flame Graph

comprehend

This is the best of times, architects have a variety of open source components that simply glue together to implement requirements.

This is the worst of times, out-of-the-box spoils the architects, and we fly high and confident with other people's stuff, thinking we have the magic. But when one unfortunately stepped on the pit and fell, he was also seriously injured because of his ignorance of reality.

My yysd - Brendan Gregg once said:

You never know a company (or person) until you see them on their worst day

You'll never recognize a company (or person) until you see them on their worst day.

The real test of a programmer or architect is not when he draws a grand blueprint (PPT) for a new project, nor how much he knows about new concepts and new technologies. Instead, when there is a problem with the existing architecture, without previous experience, how to explore a solution under the constraints of various technical and non-technical conditions, and to solve new problems caused by the problem Get ready.

image.png


MarkZhu
83 声望21 粉丝

Blog: [链接]


引用和评论

0 条评论