Principle and Practice of Ant Group SOFATracer

background

The microservice architecture brings many benefits while also increasing the complexity of the system. Traditional monolithic applications are split into distributed microservices according to different dimensions, and different microservices may even be written in different languages; in addition, The deployment of services is often distributed, and there may be thousands of servers across multiple data centers in different cities. The figure below is a typical microservice architecture. The number of nodes in the figure is still relatively small. In Alipay, an offline payment transaction payment link involves hundreds of nodes.

Image source: https://www.splunk.com/en_us/data-insider/what-is-distributed-tracing.html#benefits-of-distributed-tracing

Microservices introduce the following typical problems:

It is difficult to locate faults, one request often involves multiple services, and troubleshooting even requires multiple teams
It is difficult to sort out the complete call link, and the node call relationship analysis
Performance analysis is difficult, performance short board node

The above issues are actually the observability issues of the application:

log
Trace
metrics

This article will focus on the Trace aspect, which is Distributed tracing in its entirety. In 2010, Google published Dapper's paper and shared their solution, which can be regarded as the industry's earliest distributed link tracking system. After that, major Internet companies have followed Dapper’s ideas to launch their own link tracking systems, including Twitter’s Zipkin, Ali’s Hawkeye, PinPoint, Apache’s HTrace and Uber’s Jaeger; of course, there is also the protagonist of our article: SOFATracer . There are various implementations of distributed links, so the specification of distributed link tracing was born: OpenTracing. In 2019, OpenTracing and OpenCensus merged to become OpenTelemetry.

OpenTracing

Before going deep into SOFATracer, let's briefly explain OpenTracing, because SOFATTracer is based on the OpenTracing specification (OpenTracing based on 0.22.0, the new version of the specification API is different). A Trace is composed of the Span generated by the service call and the references between them. A Span is a time span. A service call creates a new Span, which is divided into the calling Span and the called Span. Each Span contains:

TraceId and SpanId
Operation name
time consuming
Service call result

There are usually multiple service calls in a Trace link, so there will be multiple Spans. The relationship between Spans is declared by reference, and the reference is from the caller to the service provider. Two reference types are specified in OpenTracing:

ChildOf, synchronous service call, the client needs the result of the server to return before subsequent processing;
FollowsFrom, asynchronous service call, the client does not wait for the server result.

A Trace is a directed acyclic graph, and the topology of one call can be shown as follows:

The SpanContext in the figure is the data that will be shared in a request, so it is called the Span context. The data placed in the context by a service node is visible to all subsequent nodes, so it can be used for information transfer.

SOFATracer

TraceId generation

TraceId collects all service nodes in a request. The generation rules need to avoid conflicts between different TraceIds, and the cost cannot be very high. After all, the generation of the Trace link is an additional cost outside of the business logic. The TraceId generation rule in SOFATracer is: server IP + time when ID is generated + auto-increment sequence + current process number, for example:

0ad1348f1403169275002100356696

The first 8 digits, 0ad1348f, are the IP of the machine that generated the TraceId. This is a hexadecimal number. Every two digits represent a segment of the IP. We can convert this number to decimal according to every two digits to get the common ones. The IP address is represented as 10.209.52.143, and you can also find the first server that the request passes through according to this rule. The last 13 digits 1403169275002 are the time when TraceId was generated. The next 4 digits, 1003, is an increasing sequence, rising from 1000 to 9000, and after reaching 9000, it returns to 1000 and then starts to rise. The last 5 digits, 56696, are the current process ID. In order to prevent TraceId conflicts in multiple processes on a single machine, the current process ID is added to the end of the TraceId.

The pseudo code is as follows:

TraceIdStr.append(ip).append(System.currentTimeMillis())
append(getNextId()).append(getPID());

SpanId generation

SpanId records the service call topology, in SOFATracer:

Dot represents call depth
The numbers represent the calling sequence
SpanId is created by the client

The generation rules of TraceId and SpanId in SOFATracer refer to Ali's Hawkeye component

Combining the calling Span and the called Span, and combining TraceId and SpanId can build a complete service calling topology:

Trace buried point

But how do we generate and obtain trace data? This is where the Trace collector (Instrumentation Framework) comes on stage, which is responsible for:

Trace data generation, transmission and reporting
Analysis and injection of Trace context

And the Trace collector must be automatic, low intrusion and low overhead. The structure of a typical Trace collector is as follows, which is embedded before the business logic:

Server Received (SR), create a new parent Span or extract from the context
Call business code
The business code initiates a remote service call again
Client Send (CS) creates a sub-Span and transmits TraceId, SpanId and transparent data
Client Received (CR), end the current child Span, record/report the Span
Server Send (SS) ends the parent Span and records/reports the Span

Steps 3-5 may not be available, or they may be repeated multiple times.

There are various implementations of burying logic, and the current mainstream methods are as follows:

Filter, request filter (dubbo, SOFARPC, Spring MVC)
AOP aspects (DataSource, Redis, MongoDB)

a.Proxy

b.ByteCode generating

Hook mechanism (Spring Message, RocketMQ)

In the Java language, SkyWalking and PinPoint both use javaagent to achieve automatic and non-intrusive embedding. Typically, SOFATracer implements Spring MVC Trace buried points as follows:

SOFATracer's Span is 100% created, but log/report supports sampling. Relatively speaking, log/report has a higher overhead and is more likely to become a performance bottleneck under heavy traffic/load. For other trace systems, Span is generated by sampling, but in order to have 100% trace in case of an error in the call, they adopt a reverse sampling strategy.

SOFATracer prints Trace information to the log file by default

client-digest: call Span
server-digest: called Span
client-stat: Call Span's data aggregation within one minute
server-stat: Data aggregation of Span called within one minute

The default log format is JSON, but it can be customized.

APM

A typical Trace system, in addition to the collection and reporting of Trace, there will also be Collector, Storage and API & UI: Application Performance Management, abbreviated as APM, as shown in the following figure:

Image source: https://pinpoint-apm.github.io/pinpoint/overview.html

The general requirements for Trace data reporting include real-time, consistency, etc. SOFATracer supports Zipkin reporting by default; it involves streaming calculations before storage, and the combination of calling Span and called Span generally uses Alibaba JStorm or Apache Flink; after processing is completed, it will Put it in Apache HBase, because Trace data is only useful for a short time, so the automatic elimination mechanism of expired data is generally adopted, and the expiration time is generally about 7 to 10 days. In the final display part, query and analysis from HBase require support:

Graphical display of directed acyclic graph
Query by TraceId
Query by caller
Query by callee
Query by IP

Image source: https://pinpoint-apm.github.io/pinpoint/images/ss_server-map.png

Within the Ant Group, we did not use Span for reporting. Instead, Span prints to the log and collects it on demand. The structure is as follows:

(Relic and Antique are not real system names.)

There is DaemonSet Agent on the host for collecting Trace logs, digest log for troubleshooting & stat logs for business monitoring, which is the log content to be collected. After the log data is collected, it is processed by the Relic system: single-machine log data is cleaned and aggregated; and then further integrated by the Antique system, the service data of Trace is aggregated by application and service latitude through Spark. Finally, we store the processed Trace data in the time series database CeresDB and provide it to the Web Console for query and analysis. This system can also be configured with monitoring and alarms to warn of abnormalities in the application system in advance. At present, the above monitoring and alarm can be achieved in quasi-real time, with a delay of about 1 minute.

The development of full link tracking has been continuously improved, and the functions have been continuously enriched. The Application Performance Management involved at this stage not only includes the complete capabilities of full link tracking, but also includes:

Storage & analysis, rich terminal features
Full link stress test
Performance analysis
Monitoring & alarm: CPU, memory and JVM information, etc.

Within the Ant Group, we have a dedicated pressure measurement platform. When the platform initiates pressure measurement traffic, it will bring artificially constructed TraceId, SpanId, and transparent data (pressure measurement flag) to achieve separate printing of logs. Welcome to choose SOFATracer as a full link tracking tool, SOFATracer's quick start guide Link:

Outlook

SOFATracer's future development plan is as follows, everyone is welcome to contribute! Project Github link.

Principle and Practice of Ant Group SOFATracer

background

OpenTracing

SOFATracer

TraceId generation

SpanId generation

Trace buried point

APM

Outlook

Related Links

Recommended reading this week

SOFAStack

引用和评论

蚂蚁 Flink 实时计算编译任务 Koupleless 架构改造

得物增长兑换商城的构架演进

得物业务参数配置中心架构综述

分析型数据库入门指南：如何选择适合你的实时分析工具？

HarmonyOS NEXT HiLog日志学习和分析

微服务架构中10个常用的设计模式

百万架构师第二十五课：分布式架构的基础：分布式系统的基石TCP-IP通讯协议｜JavaGuide