Click to subscribe to the column "Cloud Recommendation", get the official recommended content, and learn technology without getting lost!
As distributed applications become more and more common, distributed applications need to rely on powerful observability facilities to provide monitoring guarantees, and powerful observability facilities need to rely on high-quality telemetry data. Although there are already many open source or commercial vendors that provide telemetry data monitoring and collection solutions. However, in the absence of a unified standard, the compatibility of collected telemetry data is poor, and maintenance of the monitoring client also brings a heavy burden to users.
Opentelemetry can provide developers with a unified, third-party independent telemetry data collection solution to solve the above-mentioned problems.
Origin
Opentelemetry originated from the merger of the two open source communities, OpenTracing and OpenCensus. OpenTracing was initiated by Ben Sigelman in 2016 to solve the problem of repeated development and monitoring of the open source Tracing client, the data model is not uniform, and the high cost of Tracing back-end compatibility. OpenCensus is derived from Google's internal practice, combining Tracing and Metrics' monitoring client open source toolkit.
Since the influence of the two open source communities is not small, the existence of two or more Tracing standards is itself contrary to the purpose of the community formation. So the two open source communities hit it off and established OpenTelemetry.
Why need
From the user's point of view, access to the Tracing monitoring client is somewhat intrusive to the business code. Once connected to the monitoring client of a supplier, it is difficult to switch to the monitoring client provided by other suppliers. From the perspective of the Tracing server supplier, in addition to being able to process the data of its own Tracing client, the server also needs to be compatible with the data generated by the Tracing client of other vendors, and the maintenance cost is getting higher and higher. Especially when distributed applications are becoming more and more popular, as mentioned at the beginning of the article, the value of Opentelemetry is more obvious.
Opentelemetry project composition
The Opentelemetry project is mainly divided into four parts
Cross-language specification
Collector, a tool for collecting, converting, and forwarding telemetry data
Each language monitoring client API & SDK
Automatic monitoring of client and third-party libraries Instrumentation & Contrib
Cross-language specification
The specification covers a wide range of topics. There are specifications that include the internal implementation of the telemetry client, and there are also the protocol specifications that include the implementation of the telemetry client and external communication.
Code repository: opentelemetry-specification
The specifications required for the internal implementation of the telemetry client, such as the basic architecture and design principles of the monitoring client, the concept and model definition of the telemetry signal (Traces/ Metrics/ Logs) and auxiliary objects (Baggage/ Context/ Propagator), to achieve the needs of the telemetry client Design of realized classes and functions, etc. This part of the content will not be introduced in detail in this article. You can check it in specification/overview.md and datamodel.md/api.md/sdk.md under the corresponding object folder.
The protocol specification required for the telemetry client to communicate with the outside mainly refers to the OpenTelemetry Protocol (OTLP for short). OTLP is the native telemetry signal transmission protocol of Opentelemetry. Although the components in the Opentelemetry project support the implementation of the Zipkin v2 or Jaeger Thrift protocol format, they are all provided in the form of third-party contributed libraries. Only OTLP is the format officially natively supported by Opentelemetry.
OTLP's data model definition is based on ProtoBuf. If you need to implement a set of back-end services that can collect OTLP telemetry data, you need to understand the contents. You can refer to the code repository for the corresponding:
Code repository: opentelemetry-proto
Collector, a tool for collecting, converting, and forwarding telemetry data
In the practice of Tracing, there is a principle that the telemetry data collection process needs to be orthogonal to the business logic processing. It means that the process of collecting and transmitting telemetry data to the telemetry back-end service does not occupy the channel/thread of the business logic, and try to monitor the impact of the client on the original business logic as little as possible. Collector is the product of practice based on this principle.
Code repository: opentelemetry-collector
From an architectural perspective, Collector has two modes. One is to deploy Collector in the same host with the same application (such as K8S DaemonSet), or in the same Pod with the same application (such as Sidecar in K8S), and the telemetry data collected by the application is directly transmitted to Collector through the loopback network. This mode is collectively referred to as the Agent mode.
Another mode is to treat Collector as an independent middleware, and the application transfers the collected telemetry data to this middleware. This mode is called Gateway mode.
The two modes can be used alone or in combination, as long as the data protocol format of the data export is consistent with the data protocol format of the data entry.
Opentelemetry Architecture
In the internal design of Collector, a set of data flow in, processing, and flow out process is called pipeline. A pipeline is composed of three components, which are receiver/processor/exporter.
receiver
Responsible for monitoring and receiving telemetry data according to the corresponding protocol format, and transferring the data to one or more processors
processor
Responsible for processing telemetry data, such as discarding data, adding information, batch processing, etc., and passing the data to the next processor or to one or more exporters
exporter
Responsible for sending the data to the next receiving end (usually the telemetry backend), the exporter can be defined to obtain telemetry data from multiple different processors at the same time
Collector Pipeline
As can be seen from the above design, Collector not only provides the ability to allow telemetry data collection and business logic processing to be orthogonal, but also acts as an adapter for telemetry data to connect to the telemetry backend. Collector can receive data in any format such as otlp, zipkin, jaeger, etc., and then forward it in any format such as otlp, zipkin, jaeger, etc. It all depends on whether the format you need to input or output has a corresponding receiver and exporter implementation. The otlp related implementations are all in the opentelemetry-collector warehouse. For the implementation of protocols other than otlp, you can refer to the code repository below.
Code repository: opentelemetry-collector-contrib
Each language monitoring client API & SDK
Opentelemetry provides basic monitoring client API & SDK packages for each language. These packages are generally based on the suggestions and definitions in the opentelemetry-specification, and combined with the characteristics of the language itself, to achieve the basic ability to collect telemetry data on the client. Such as the transfer of metadata between services and processes, Trace adds monitoring and data export, and the creation, use and data export of Metrics indicators. The following is the code warehouse table corresponding to each language monitoring client API & SDK package.
According to the plan of the Opentelemetry project, most of the components will be supported by Tracing in the first half of 2021. At the current time point (December 2021), the C++/.NET/Golang/Java/Javascript/Python/Ruby monitoring client's support for Tracing has entered the stable state. The Erlang/Rust/Swift monitoring client's support for Tracing has entered the Beta testing phase.
The Opentelemetry project plan to support Mertics later. It is hoped that most of the components will be able to complete the support of Metrics in the second half of 2021. Judging from the current situation, the Metrics support for each language client package is still in the Alpha testing stage. The support for Logs is planned to start in 2022.
Instrumentation & Contrib
If you simply use the monitoring client API & SDK package, many operations need to modify the application code. Such as adding Tracing monitoring points, recording field information, packing and unpacking metadata that is passed between processes/services, etc. This method is code intrusive, not easy to decouple, and has a high operating cost, which increases the threshold for users to use. At this time, you can use the design patterns or language features of public components to lower the threshold for users.
Using the design pattern of public components, such as the Gin component in Golang, realizes the Middleware responsibility chain design pattern. We can refer to the github.com/gin-gonic/gin library, create an otelgin.Middleware, and manually add it to the Middleware chain to realize fast monitoring of Gin.
Using language features, such as Java's ability to use Java Agent and bytebuddy bytecode weaving technology, find the corresponding classes and methods before the Java application starts, modify bytecode injection monitoring, and realize automatic monitoring of specified classes.
In theory, fast monitoring depends on the client API & SDK, and automatic monitoring depends on fast monitoring. But the actual operation did not follow the theory. For example, the Java language uses Java Agent and bytebuddy technology to realize fully automatic monitoring of specified open source components, so there is no separate fast monitoring (separate in OpenTracing).
Summarize
The mission of Opentelemetry is to collect high-quality, large-scale, and portable telemetry data to make effective observability facilities possible. It itself does not provide a complete observability solution, but provides a unified telemetry data collection program. If you need to build a complete set of observability facilities, you also need to match the corresponding monitoring back-end for data persistence and data query, such as Tracing back-end zipkin/jaeger/tempo/, metrics back-end prometheus, logs back-end loki Wait.
Zhou Dongke recommends excellent articles The past and present of link tracing (
"Yunjian Big Coffee" is a column for Tencent Cloud Plus community premium content. Cloud recommendation officials specially invite industry leaders to focus on the implementation of cutting-edge technologies and theoretical practice, and continue to interpret hot technologies in the cloud era and explore new opportunities for industry development. Click one-click to subscribe to , and we will regularly push premium content for you.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。