📄
Text|Zhao Chen (SOFA Open Source Summer Link Project Team)
Master of Computer Engineering in Wuhan University of Technology
Research direction: automatic coloring of thangka line drafts
Proofreading|Song Guolei (SOFATracer commiter)
This article is 6971 words read in 18 minutes
▼
background
Fortunately to participate in the open source software supply chain lighting plan-open source project supported by summer 2021, SOFATracer has been able to report the buried point data to Zipkin. The main goal of this project is to report the generated buried point data to Jaeger and SkyWalking. Visual display.
PART. 1 SOFATracer
SOFATracer is a distributed link tracking system developed by Ant Group based on the OpenTracing specification. Its core concept is to connect the same request distributed on each service node in series through a global TraceId. Through the unified TraceId, various network call conditions in the call link are recorded in a log to achieve the purpose of perspective network call. These link data can be used for rapid fault discovery, service management, etc.
SOFATracer provides log printing capabilities of asynchronous landing disks and the ability to report link tracking data to the open source product Zipkin for distributed link tracking display. The task of participating in the Open Source Summer event is to report the link tracking data to Jaeger and SkyWalking for display.
SOFATracer data report
The above figure is the link reporting process in SOFATracer. Span#finish is the last execution method of the span life cycle. This is the entry point for the entire data report. The report span method of SOFATracer contains the report link display end and log placement. part. SOFATracer does not separate the reported data collector from the log placement. It just calls the SOFATracer#invokeReporListeners method before the log placement, finds all instances in the system that implement the SpanReportListener interface and joins the SpanReportListenersHolder, and call its onSpanReport method to complete the link data reporting. Data collector. The following code snippet is a concrete implementation of the invokeReportListeners method.
protected void invokeReportListeners(SofaTracerSpan sofaTracerSpan) {
List<SpanReportListener> listeners = SpanReportListenerHolder
.getSpanReportListenersHolder();
if (listeners != null && listeners.size() > 0) {
for (SpanReportListener listener : listeners) {
listener.onSpanReport(sofaTracerSpan);
}
}
}
The instance in SpanReportListenerHolder is added when the project starts, and it is divided into two cases: Spring Boot application and Spring application:
- In the Spring Boot application, the automatic configuration class SOFATracerSpanRemoteReporter will save all current bean instances of type SpanReportListener to the List object of SpanReportListenerHolder. Instance objects of SpanReportListener will be injected into the IOC container in their AutoConfiguration auto-configuration class.
- In Spring applications, by implementing the bean life cycle interface InitializingBean provided by Spring, the instance object of SpanReportListener is instantiated in the afterPropertiesSet method and added to the SpanReportListenerHolder.
To upload the trace data in SOFATracer to Jaeger and SkyWalking, you need to implement the SpanReportListener interface and add the corresponding instance to the SpanReportListenersHolder when the application starts.
PART. 2 Jaeger data report
The figure below is a partial illustration of data reporting in Jaeger. In the figure, the CommandQueue stores refresh or add instructions. The producer is the sampler and flush timer, and the consumer is the queue processor. The sampler judges that a span needs to be reported and adds an AppendCommand to the CommandQueue. The flush timer continuously adds FlushCommand to the queue according to the set flushInterval. The queue processor continuously reads the command from the CommandQueue to determine whether it is AppendCommand or FlushCommand. The data in the byteBuffer is sent to the receiving end, if it is an add command, the span is added to the byteBuffer for temporary storage.
The main work in the process of reporting to Jaeger is the conversion of Jaeger Span and SOFATracer Span models. After the conversion, the span is sent to the backend using the above logic.
The figure above is the UML diagram of Sender in Jaeger. From the figure, you can see that there are two types of Sender, HTTPSender and UDPSender. Send data to the application HTTP and UDP respectively, use UDPSender to send span data to Jaeger Agent in the implementation of SOFATracer reporting to Jaeger, and use HTTPSender to send data directly to Jaeger-Collector.
Jaeger Span and SOFATracer Span model conversion
Model conversion comparison
Processing of TraceId and SpanId
Conversion of TraceId:
- The problem is that the TracerId generation rule in SOFATracer is: server IP + ID generation time + auto-increment sequence + current process number
For example: 0ad1348f1403169275002100356696 The first 8 digits of 0ad1348f are the IP of the machine that generated the TraceId. This is a hexadecimal number, and every two digits represent a segment of the IP. We can convert this number to decimal according to every two digits. Get the common IP address representation 10.209.52.143, you can also find the first server that the request passes through according to this rule. The last 13 digits 1403169275002 are the time when TraceId was generated. The next 4 digits of 1003 is a self-increasing sequence, rising from 1000 to 9000, and after reaching 9000, it returns to 1000 and then starts to rise. The last 5 digits, 56696, are the current process ID. In order to prevent TraceId conflicts in multiple processes on a single machine, the current process ID is added to the end of the TraceId. ——TraceId and SpanId generation rules
In SOFATracer, TraceId is of String type, but in Jaeger, TraceId is two Long integers used to form the final TraceId.
solution
In Jaeger, TraceId is represented by TraceIdHigh and TraceIdLow. Internally use functions to convert the two into String type TraceIdAsString. During the splicing process, the two IDs are converted to the corresponding HexString respectively. When the HexString is not enough 16 bits, the header is increased by 0. .
StringBuilder builder = new StringBuilder(desiredLength);
int offset = desiredLength - id.length();
for (int i = 0; i < offset; i++)
builder.append('0');
builder.append(id);
return builder.toString();
}
Conversion of SpanId
- The problem is that SpanId is a Long integer in Jaeger, and is String in SOFATracer.
- Solution The solution to this problem is the same as the previous solution converted to SpanId in Zipkin. It also uses FNV Hash to map the String to the Long type with less conflict.
Two upload methods
Cooperate with Jaeger Agent
The Jaeger agent is a network daemon that listens for spans sent over UDP, which it batches and sends to the Collector. It is designed to be deployed to all hosts as an infrastructure component. The agent abstracts the routing and discovery of the Collectors away from the client.
Jaeger Agent is designed as a basic component to be deployed on the host, which can extract the task of routing and discovering Collectors from the client. Agent can only accept data in Thrift format sent via UDP, so to use Jaeger Agent, you need to use UDPSender.
Report to Collector using HTTP protocol
When using UDP to report to Jaeger Agent, in order to ensure that data is not lost during transmission, Jaeger Agent should be deployed on the machine where the service is located. However, in some cases, the aforementioned requirements cannot be met. In this case, the HTTP protocol can be used to send data directly to Collector. When using HTTPSender.
PART. 3 SkyWalking data report
SkyWalking is an application performance monitoring tool for distributed systems. It is designed for microservices, cloud-native architecture and container-based architecture. It provides integrated solutions for distributed tracking, service grid telemetry analysis, measurement aggregation and visualization. SkyWalking uses bytecode injection to achieve non-intrusive code and excellent performance. The receiver-trace module of SkyWalking can receive trace data in SkyWalking format through gRPC and HTTPRestful services. The reporting method selected in the implementation of reporting SkyWalking is to report through HTTPRestful services.
Model conversion comparison
Conversion of SegmentId, SpanId, PatentSpanID
The SpanId in SOFATracer is a string, but in SkyWalking SpanId and ParentSpanId are an int integer and the SpanId in each segment is numbered starting from 0. The maximum SpanId is specified by the maximum number of spans in a configured segment. During the conversion process, you need to specify the SpanId. Because there is only one span in each segment, the ID of the span in the segment generated by the conversion can be fixed to 0.
SegmentId is used to uniquely identify a segment. If the segmentId is the same as the previous segment, the previous segment will be overwritten by the following segment, causing the span to be lost. The last segmentId used is constructed by segmentId = traceId + SpanId hash value + 0/1, where 0 and 1 represent server and client, respectively. The reason why client and server need to be added at the end is that there is server -> server in Dubbo and SOFARPC. The SpanId and parentId of the client and server span called by RPC are the same. You need to use this to distinguish them, otherwise the span on the client side will be Is covered.
Dubbo and SOFARPC processing
The basic model is client-server-client-server-. This model, but in Dubbo and SOFARPC, there is a server -> server situation, in which client span and server span two spans except for the different kind types, other information The same one.
- parentSegmentId
To find out the parentSegmentId, in the case of non-SOFARPC and Dubbo, follow server -> client, client -> server, that is, the parent spa of the client can only be of the server type, and the parent span of the server can only be empty or the client type. The conversion method is in SOFARPC and Dubbo, according to the link display of the two when using SkyWalking Java Agent to report, the conversion is as follows:
server span: parentSegmentId = traceId + parentId hash value + client(1)
client span: parentSegmentId = traceId + parentId hash value + server(0)
server span: parentSegmentId = traceId + spanId hash value + client(1)
client span: parentSegmentId = traceId + parentId hash value + server(0)
- field and networkAddressUsedAtPeer field :
Peer field
In Dubbo, the Peer field can be composed of two tags, remote.host and remote.port. SOFARPC contains IP and port in remote.ip, and only IP is used, because the client cannot be obtained in the span reported on the server. The client is using itself. Which end.
networkAddressUsedAtPeerDubbo
You can use local.host and local.port to form the IP of the machine in SOFARPC, which cannot be obtained directly from the span. The first valid IPv4 address of the machine is used, but there is no port number, so in the peer field above Also only used IP.
### Show topological map
In the process of building a link, several key fields are peer, networkAddressUsedAtPeer, parentService, parentServiceInstance, and parentEndpoint. Among them, Peer and networkAddressUsedAtPeer respectively represent the peer address and the address used by the client to call the current instance. The function of these two fields is to connect the instances in the link. If these two fields are missing, the link will be disconnected. In the process, these two fields are obtained by searching or obtaining the first legal IPv4 address of the machine in the span tag. The function of the last three fields is to point out the corresponding parent instance node. If these three fields are not set, an empty instance information will be generated, as shown in the figure below. Currently in SOFATracer, only TraceIdSpanId, parentId, sysBaggage, and bizBaggage can be propagated in the context. The above three fields cannot be obtained from them. In order to display the topology diagram, seven fields service, serviceInstance, endpoint, parentService have been added to the context of SOFATracer. , ParentServiceInstance, parentEndpoint, peer In this way, information about the parent service can be obtained during the conversion process.
Asynchronous upload
Use HTTP to report segment data in Json format to the backend, and use the message as the unit when reporting, and multiple segments are combined into one message.
The process is as shown in the figure below. After the span is over, the converted segment is added to the segment buffer array, and another thread keeps refreshing the data in the array to the message. When the size of the message reaches the maximum value or the waiting time for sending reaches the set value, it will be sent once. Data, the maximum default message set is 2MB.
PART. 4 Pressure test
Test configuration
- Windows 10
- Memory 16G
- Disk 500GB SSD
- Intel(R) Core(TM) i7-7700HQ CPU @2.80GHz 2.80GHz
test method
Deploy an invocation link containing six services. Set up three sets of controls:
- Do not collect span
- 50% collection
- Full collection
Jaeger test results
Several parameters related to the test are set as follows:
Jaeger Agent method
Full collection
50% collection
do not collect
Report to Jaeger Collector
Full collection
50% collection
do not collect
SkyWalking test results
Complete Collection Collection
50% acquisition
do not collect
test summary
Among the three reporting methods at full sampling, the local throughput rate of SkyWalking is the lowest, which is only 512.75/sec, which is reduced by about 14% compared to the reported Jaeger Agent throughput rate and 11.89% compared to uploading the Jaeger Agent throughput rate. . For each method, the throughput rate of full sampling and non-sampling is compared: the throughput rate of full sampling decreased by 14.6% when reporting to Jaeger Agent, the throughput rate decreased by 17% when reporting to Jaeger Collector, and when reporting to SkyWalking due to full sampling The throughput rate dropped by approximately 23%.
The link visualization of SOFATracer introduced this time will be released in the next version.
-
"Harvest"
I am very lucky to be able to participate in this summer of open source activities. I learned a lot of excellent design ideas and implementation methods during the process of reading the SOFATracer source code. During the implementation process, I would imitate some source code implementation methods. I learned a lot in this process. . During the project implementation process, I also found some of my own problems. For example, when I had a little idea to solve the problem, I started to do it. I didn't dig deep into whether this idea was feasible. This bad habit wasted a lot of time. This is the first time I participated in the related activities of the open source community. During this process, I learned about the operation mode of the open source community. In the future learning process, I will work harder to improve my code ability and strive to contribute to the open source community. contribute.
Special thanks to Mr. Song Guolei for his patient guidance. During the project, Mr. Song helped me solve a lot of doubts and learned a lot of things. I would like to thank the SOFAStack community for helping me in the whole process, and thank the organizer for the platform provided by the event. .
-
「Reference Materials」
- SOFATracer data reporting mechanism and source code analysis of the distributed link tracking component of Ant Group | Anatomy
- Use SkyWalking to realize full link monitoring
- Zipkin-SkyWalking Exporter
- STAM: Topology automatic detection method for large-scale distributed application systems
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。