头图
结论先行

【结论】
SkyWalking通过字节码增强技术实现,结合依赖注入和控制反转思想,以SkyWalking方式将追踪身份traceId编织到链路追踪上下文TraceContext中。

是不是很有趣,很有意思!!!

【收获】
skywalking-agent启用的插件列表plugins/要有所取舍与衡量,组件开启的越多对链路追踪和拓扑的越复杂,影响面越大,未知不可控的因素也会增多。

背景

发现问题

生产环境,发现同一个链路追踪traceId出现在不同时间段的N个请求,都串在一起,影响链路追踪复原和拓扑展示。

@Configuration
public class ThreadPoolConfig {

    @Bean(name = "eventThreadPool")
    public ThreadPoolExecutor commonThreadPool() {
//        int corePoolSize = Runtime.getRuntime().availableProcessors();
        ThreadPoolExecutor executor = new ThreadPoolExecutor(
                1, // 分析问题时有意设置的,让问题能100%复现
                1,
                1,
                TimeUnit.SECONDS,
                new ArrayBlockingQueue<>(50000),
                new NamedThreadFactory("wanda_event"),
                new ThreadPoolExecutor.CallerRunsPolicy());
        return executor;
    }
}

分析问题

我们需要找出线程池的线程中的追踪身份traceId是怎么生成的?

【说明】

  • 使用的skywalking-agent.jar版本是9.1.0,使用默认的插件列表plugins/配置,包括apm-guava-eventbus-plugin
  • 没有启用引导插件列表bootstrap-plugins/,将其复制到plugins/,包括apm-jdk-threadpool-plugin,SkyWalking默认不启用引导插件列表,因为其影响面较大,对应用性能和追踪数据都可能产生较大影响;

【思考】

  • 追踪身份traceId是在请求根节点创建,且不可变,后续在请求生命周期中都是透传。所以,抓住生成traceId的源头很关键
  • 生成traceId的源头在哪里?需要从实现层面掌握traceId生成逻辑
  • 一个应用实例中包含很多线程,还需考虑生成traceId线程名称

综上所述,以新的追踪身份traceId生成 + 线程名称作为核心排查思路

追踪身份traceId生成的实现原理剖析

org.apache.skywalking:java-agent:9.1.0
以当前最新版本v9.1.0源代码作为剖析对象。

TraceContext.traceId()

org.apache.skywalking.apm.toolkit.trace.TraceContext#traceId

请求链路追踪上下文TraceContext,调用TraceContext.traceId()获取追踪身份traceId

package org.apache.skywalking.apm.toolkit.trace;

import java.util.Optional;

/**
 * Try to access the sky-walking tracer context. The context is not existed, always. only the middleware, component, or
 * rpc-framework are supported in the current invoke stack, in the same thread, the context will be available.
 * <p>
 */
public class TraceContext {

    /**
     * Try to get the traceId of current trace context.
     * 尝试获取当前追踪上下文的追踪身份traceId
     *
     * @return traceId, if it exists, or empty {@link String}.
     */
    public static String traceId() {
        return "";
    }

    /**
     * Try to get the segmentId of current trace context.
     *
     * @return segmentId, if it exists, or empty {@link String}.
     */
    public static String segmentId() {
        return "";
    }

    /**
     * Try to get the spanId of current trace context. The spanId is a negative number when the trace context is
     * missing.
     *
     * @return spanId, if it exists, or empty {@link String}.
     */
    public static int spanId() {
        return -1;
    }

    /**
     * Try to get the custom value from trace context.
     *
     * @return custom data value.
     */
    public static Optional<String> getCorrelation(String key) {
        return Optional.empty();
    }

    /**
     * Put the custom key/value into trace context.
     *
     * @return previous value if it exists.
     */
    public static Optional<String> putCorrelation(String key, String value) {
        return Optional.empty();
    }

}

1.链路追踪上下文的traceId是如何设置进去的?

在GitHub skywalking:java-agent项目仓库里搜索org.apache.skywalking.apm.toolkit.trace.TraceContext
repo:apache/skywalking-java org.apache.skywalking.apm.toolkit.trace.TraceContext language:Java
在这里插入图片描述
在IDEA skywalking:java-agent项目源代码里搜索org.apache.skywalking.apm.toolkit.trace.TraceContext
在这里插入图片描述

【结论】
SkyWalking通过字节码增强技术实现,结合依赖注入和控制反转思想,以SkyWalking方式将追踪身份traceId编织到链路追踪上下文TraceContext中。

数据更新是不是又多了一种实现方式。。。
在这里插入图片描述

TraceContextActivation

org.apache.skywalking.apm.toolkit.activation.trace.TraceContextActivation

链路追踪上下文激活TraceContextActivation,通过TraceIDInterceptor拦截TraceContext.traceId(),将追踪身份traceId设置到链路追踪上下文TraceContext

package org.apache.skywalking.apm.toolkit.activation.trace;

import net.bytebuddy.description.method.MethodDescription;
import net.bytebuddy.matcher.ElementMatcher;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.ClassStaticMethodsEnhancePluginDefine;
import org.apache.skywalking.apm.agent.core.plugin.match.ClassMatch;
import org.apache.skywalking.apm.agent.core.plugin.match.NameMatch;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.StaticMethodsInterceptPoint;

import static net.bytebuddy.matcher.ElementMatchers.named;

/**
 * Active the toolkit class "TraceContext". Should not dependency or import any class in
 * "skywalking-toolkit-trace-context" module. Activation's classloader is diff from "TraceContext", using direct will
 * trigger classloader issue.
 * <p>
 */
public class TraceContextActivation extends ClassStaticMethodsEnhancePluginDefine {

    // 追踪身份traceId拦截类
    public static final String TRACE_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor";
    public static final String SEGMENT_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.SegmentIDInterceptor";
    public static final String SPAN_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.SpanIDInterceptor";
    // 增强类-追踪上下文
    public static final String ENHANCE_CLASS = "org.apache.skywalking.apm.toolkit.trace.TraceContext";
    // 获取追踪身份traceId的静态方法名称
    public static final String ENHANCE_TRACE_ID_METHOD = "traceId";
    public static final String ENHANCE_SEGMENT_ID_METHOD = "segmentId";
    public static final String ENHANCE_SPAN_ID_METHOD = "spanId";
    public static final String ENHANCE_GET_CORRELATION_METHOD = "getCorrelation";
    public static final String INTERCEPT_GET_CORRELATION_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.CorrelationContextGetInterceptor";
    public static final String ENHANCE_PUT_CORRELATION_METHOD = "putCorrelation";
    public static final String INTERCEPT_PUT_CORRELATION_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.CorrelationContextPutInterceptor";

    /**
     * @return the target class, which needs active.
     */
    @Override
    protected ClassMatch enhanceClass() {
        // 增强类
        return NameMatch.byName(ENHANCE_CLASS);
    }

    /**
     * @return the collection of {@link StaticMethodsInterceptPoint}, represent the intercepted methods and their
     * interceptors.
     */
    @Override
    public StaticMethodsInterceptPoint[] getStaticMethodsInterceptPoints() {
        // 静态方法拦截点
        return new StaticMethodsInterceptPoint[] {
            new StaticMethodsInterceptPoint() {
                @Override
                public ElementMatcher<MethodDescription> getMethodsMatcher() {
                    // 获取追踪身份traceId的静态方法名称
                    return named(ENHANCE_TRACE_ID_METHOD);
                }

                @Override
                public String getMethodsInterceptor() {
                    // 追踪身份traceId拦截类
                    return TRACE_ID_INTERCEPT_CLASS;
                }

                @Override
                public boolean isOverrideArgs() {
                    return false;
                }
            },
            // ...
        };
    }
}

TraceIDInterceptor

org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor

追踪身份拦截器TraceIDInterceptor,调用ContextManager.getGlobalTraceId()获取追踪身份traceId,将其返回给TraceContext.traceId()
在这里插入图片描述

package org.apache.skywalking.apm.toolkit.activation.trace;

import java.lang.reflect.Method;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.StaticMethodsAroundInterceptor;
import org.apache.skywalking.apm.agent.core.context.ContextManager;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.MethodInterceptResult;

public class TraceIDInterceptor implements StaticMethodsAroundInterceptor {

    private static final ILog LOGGER = LogManager.getLogger(TraceIDInterceptor.class);

    @Override
    public void beforeMethod(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,
        MethodInterceptResult result) {
        // 获取第一个全局追踪身份traceId,将其定义为方法返回值
        result.defineReturnValue(ContextManager.getGlobalTraceId());
    }

    @Override
    public Object afterMethod(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,
        Object ret) {
        // 返回追踪身份traceId
        return ret;
    }

    @Override
    public void handleMethodException(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,
        Throwable t) {
        LOGGER.error("Failed to getDefault trace Id.", t);
    }
}

ContextManager.getGlobalTraceId()

org.apache.skywalking.apm.agent.core.context.ContextManager#getGlobalTraceId

链路追踪上下文管理器ContextManager
ContextManager.getGlobalTraceId()是获取第一个全局追踪身份traceId,其调用AbstractTracerContext.getReadablePrimaryTraceId()获取全局追踪身份traceId
在这里插入图片描述

package org.apache.skywalking.apm.agent.core.context;

import java.util.Objects;
import org.apache.skywalking.apm.agent.core.boot.BootService;
import org.apache.skywalking.apm.agent.core.boot.ServiceManager;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegment;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.sampling.SamplingService;
import org.apache.skywalking.apm.util.StringUtil;

import static org.apache.skywalking.apm.agent.core.conf.Config.Agent.OPERATION_NAME_THRESHOLD;

/**
 * {@link ContextManager} controls the whole context of {@link TraceSegment}. Any {@link TraceSegment} relates to
 * single-thread, so this context use {@link ThreadLocal} to maintain the context, and make sure, since a {@link
 * TraceSegment} starts, all ChildOf spans are in the same context. <p> What is 'ChildOf'?
 * https://github.com/opentracing/specification/blob/master/specification.md#references-between-spans
 *
 * <p> Also, {@link ContextManager} delegates to all {@link AbstractTracerContext}'s major methods.
 */
public class ContextManager implements BootService {
    private static final String EMPTY_TRACE_CONTEXT_ID = "N/A";
    private static final ILog LOGGER = LogManager.getLogger(ContextManager.class);
    // 追踪上下文的线程本地变量
    private static ThreadLocal<AbstractTracerContext> CONTEXT = new ThreadLocal<AbstractTracerContext>();
    private static ThreadLocal<RuntimeContext> RUNTIME_CONTEXT = new ThreadLocal<RuntimeContext>();
    private static ContextManagerExtendService EXTEND_SERVICE;

    private static AbstractTracerContext getOrCreate(String operationName, boolean forceSampling) {
        AbstractTracerContext context = CONTEXT.get();
        if (context == null) {
            if (StringUtil.isEmpty(operationName)) {
                if (LOGGER.isDebugEnable()) {
                    LOGGER.debug("No operation name, ignore this trace.");
                }
                context = new IgnoredTracerContext();
            } else {
                if (EXTEND_SERVICE == null) {
                    EXTEND_SERVICE = ServiceManager.INSTANCE.findService(ContextManagerExtendService.class);
                }
                context = EXTEND_SERVICE.createTraceContext(operationName, forceSampling);

            }
            CONTEXT.set(context);
        }
        return context;
    }

    /**
     * 获取第一个全局追踪身份traceId
     * @return the first global trace id when tracing. Otherwise, "N/A".
     */
    public static String getGlobalTraceId() {
        // 追踪上下文
        AbstractTracerContext context = CONTEXT.get();
        // 获取全局追踪身份traceId
        return Objects.nonNull(context) ? context.getReadablePrimaryTraceId() : EMPTY_TRACE_CONTEXT_ID;
    }

    /**
     * @return the current segment id when tracing. Otherwise, "N/A".
     */
    public static String getSegmentId() {
        AbstractTracerContext context = CONTEXT.get();
        return Objects.nonNull(context) ? context.getSegmentId() : EMPTY_TRACE_CONTEXT_ID;
    }

    /**
     * @return the current span id when tracing. Otherwise, the value is -1.
     */
    public static int getSpanId() {
        AbstractTracerContext context = CONTEXT.get();
        return Objects.nonNull(context) ? context.getSpanId() : -1;
    }

    // ...

}

AbstractTracerContext.getReadablePrimaryTraceId()

org.apache.skywalking.apm.agent.core.context.AbstractTracerContext#getReadablePrimaryTraceId

追踪上下文定义接口AbstractTracerContext
本方法获取全局追踪身份traceId

package org.apache.skywalking.apm.agent.core.context;

import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;

/**
 * The <code>AbstractTracerContext</code> represents the tracer context manager.
 * 表示追踪上下文管理器
 */
public interface AbstractTracerContext {
    /**
     * Get the global trace id, if needEnhance. How to build, depends on the implementation.
     * 获取全局追踪身份traceId
     *
     * @return the string represents the id.
     */
    String getReadablePrimaryTraceId();

    /**
     * Prepare for the cross-process propagation. How to initialize the carrier, depends on the implementation.
     *
     * @param carrier to carry the context for crossing process.
     */
    void inject(ContextCarrier carrier);

    /**
     * Build the reference between this segment and a cross-process segment. How to build, depends on the
     * implementation.
     *
     * @param carrier carried the context from a cross-process segment.
     */
    void extract(ContextCarrier carrier);

    /**
     * Capture a snapshot for cross-thread propagation. It's a similar concept with ActiveSpan.Continuation in
     * OpenTracing-java How to build, depends on the implementation.
     *
     * @return the {@link ContextSnapshot} , which includes the reference context.
     */
    ContextSnapshot capture();

    /**
     * Build the reference between this segment and a cross-thread segment. How to build, depends on the
     * implementation.
     *
     * @param snapshot from {@link #capture()} in the parent thread.
     */
    void continued(ContextSnapshot snapshot);

    /**
     * Get the current segment id, if needEnhance. How to build, depends on the implementation.
     *
     * @return the string represents the id.
     */
    String getSegmentId();

    /**
     * Get the active span id, if needEnhance. How to build, depends on the implementation.
     *
     * @return the string represents the id.
     */
    int getSpanId();

    /**
     * Create an entry span
     *
     * @param operationName most likely a service name
     * @return the span represents an entry point of this segment.
     */
    AbstractSpan createEntrySpan(String operationName);

    /**
     * Create a local span
     *
     * @param operationName most likely a local method signature, or business name.
     * @return the span represents a local logic block.
     */
    AbstractSpan createLocalSpan(String operationName);

    /**
     * Create an exit span
     *
     * @param operationName most likely a service name of remote
     * @param remotePeer    the network id(ip:port, hostname:port or ip1:port1,ip2,port, etc.). Remote peer could be set
     *                      later, but must be before injecting.
     * @return the span represent an exit point of this segment.
     */
    AbstractSpan createExitSpan(String operationName, String remotePeer);

    /**
     * @return the active span of current tracing context(stack)
     */
    AbstractSpan activeSpan();

    /**
     * Finish the given span, and the given span should be the active span of current tracing context(stack)
     *
     * @param span to finish
     * @return true when context should be clear.
     */
    boolean stopSpan(AbstractSpan span);

    /**
     * Notify this context, current span is going to be finished async in another thread.
     *
     * @return The current context
     */
    AbstractTracerContext awaitFinishAsync();

    /**
     * The given span could be stopped officially.
     *
     * @param span to be stopped.
     */
    void asyncStop(AsyncSpan span);

    /**
     * Get current correlation context
     */
    CorrelationContext getCorrelationContext();

    /**
     * Get current primary endpoint name
     */
    String getPrimaryEndpointName();
}

AbstractTracerContext有两个子类IgnoredTracerContextTracingContext

IgnoredTracerContext.getReadablePrimaryTraceId()

org.apache.skywalking.apm.agent.core.context.IgnoredTracerContext#getReadablePrimaryTraceId

可忽略的追踪上下文IgnoredTracerContext
本方法返回"Ignored\_Trace"

package org.apache.skywalking.apm.agent.core.context;

import java.util.LinkedList;
import java.util.List;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopSpan;
import org.apache.skywalking.apm.agent.core.profile.ProfileStatusContext;

/**
 * The <code>IgnoredTracerContext</code> represent a context should be ignored. So it just maintains the stack with an
 * integer depth field.
 * <p>
 * All operations through this will be ignored, and keep the memory and gc cost as low as possible.
 */
public class IgnoredTracerContext implements AbstractTracerContext {
    private static final NoopSpan NOOP_SPAN = new NoopSpan();
    private static final String IGNORE_TRACE = "Ignored_Trace";

    private final CorrelationContext correlationContext;
    private final ExtensionContext extensionContext;
    private final ProfileStatusContext profileStatusContext;

    private int stackDepth;

    public IgnoredTracerContext() {
        this.stackDepth = 0;
        this.correlationContext = new CorrelationContext();
        this.extensionContext = new ExtensionContext();
        this.profileStatusContext = ProfileStatusContext.createWithNone();
    }

    // ...

    @Override
    public String getReadablePrimaryTraceId() {
        // 获取全局追踪身份traceId
        return IGNORE_TRACE;
    }

    @Override
    public String getSegmentId() {
        return IGNORE_TRACE;
    }

    @Override
    public int getSpanId() {
        return -1;
    }

    // ...

}

TracingContext.getReadablePrimaryTraceId()

org.apache.skywalking.apm.agent.core.context.TracingContext#getReadablePrimaryTraceId

链路追踪上下文TracingContext
本方法返回DistributedTraceIdid字段属性
在这里插入图片描述

package org.apache.skywalking.apm.agent.core.context;

import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.atomic.AtomicIntegerFieldUpdater;
import java.util.concurrent.locks.ReentrantLock;
import org.apache.skywalking.apm.agent.core.boot.ServiceManager;
import org.apache.skywalking.apm.agent.core.conf.Config;
import org.apache.skywalking.apm.agent.core.conf.dynamic.watcher.SpanLimitWatcher;
import org.apache.skywalking.apm.agent.core.context.ids.DistributedTraceId;
import org.apache.skywalking.apm.agent.core.context.ids.PropagatedTraceId;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractTracingSpan;
import org.apache.skywalking.apm.agent.core.context.trace.EntrySpan;
import org.apache.skywalking.apm.agent.core.context.trace.ExitSpan;
import org.apache.skywalking.apm.agent.core.context.trace.ExitTypeSpan;
import org.apache.skywalking.apm.agent.core.context.trace.LocalSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopExitSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopSpan;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegment;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegmentRef;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.profile.ProfileStatusContext;
import org.apache.skywalking.apm.agent.core.profile.ProfileTaskExecutionService;
import org.apache.skywalking.apm.util.StringUtil;

import static org.apache.skywalking.apm.agent.core.conf.Config.Agent.CLUSTER;

/**
 * The <code>TracingContext</code> represents a core tracing logic controller. It build the final {@link
 * TracingContext}, by the stack mechanism, which is similar with the codes work.
 * <p>
 * In opentracing concept, it means, all spans in a segment tracing context(thread) are CHILD_OF relationship, but no
 * FOLLOW_OF.
 * <p>
 * In skywalking core concept, FOLLOW_OF is an abstract concept when cross-process MQ or cross-thread async/batch tasks
 * happen, we used {@link TraceSegmentRef} for these scenarios. Check {@link TraceSegmentRef} which is from {@link
 * ContextCarrier} or {@link ContextSnapshot}.
 */
public class TracingContext implements AbstractTracerContext {
    private static final ILog LOGGER = LogManager.getLogger(TracingContext.class);
    private long lastWarningTimestamp = 0;

    /**
     * @see ProfileTaskExecutionService
     */
    private static ProfileTaskExecutionService PROFILE_TASK_EXECUTION_SERVICE;

    /**
     * The final {@link TraceSegment}, which includes all finished spans.
     * 追踪段,同一线程内的所有调用
     */
    private TraceSegment segment;

    /**
     * Active spans stored in a Stack, usually called 'ActiveSpanStack'. This {@link LinkedList} is the in-memory
     * storage-structure. <p> I use {@link LinkedList#removeLast()}, {@link LinkedList#addLast(Object)} and {@link
     * LinkedList#getLast()} instead of {@link #pop()}, {@link #push(AbstractSpan)}, {@link #peek()}
     */
    private LinkedList<AbstractSpan> activeSpanStack = new LinkedList<>();

    /**
     * @since 8.10.0 replace the removed "firstSpan"(before 8.10.0) reference. see {@link PrimaryEndpoint} for more details.
     */
    private PrimaryEndpoint primaryEndpoint = null;

    /**
     * A counter for the next span.
     */
    private int spanIdGenerator;

    /**
     * The counter indicates
     */
    @SuppressWarnings("unused") // updated by ASYNC_SPAN_COUNTER_UPDATER
    private volatile int asyncSpanCounter;
    private static final AtomicIntegerFieldUpdater<TracingContext> ASYNC_SPAN_COUNTER_UPDATER =
        AtomicIntegerFieldUpdater.newUpdater(TracingContext.class, "asyncSpanCounter");
    private volatile boolean isRunningInAsyncMode;
    private volatile ReentrantLock asyncFinishLock;

    private volatile boolean running;

    private final long createTime;

    /**
     * profile status
     */
    private final ProfileStatusContext profileStatus;
    @Getter(AccessLevel.PACKAGE)
    private final CorrelationContext correlationContext;
    @Getter(AccessLevel.PACKAGE)
    private final ExtensionContext extensionContext;

    //CDS watcher
    private final SpanLimitWatcher spanLimitWatcher;

    /**
     * Initialize all fields with default value.
     */
    TracingContext(String firstOPName, SpanLimitWatcher spanLimitWatcher) {
        this.segment = new TraceSegment();
        this.spanIdGenerator = 0;
        isRunningInAsyncMode = false;
        createTime = System.currentTimeMillis();
        running = true;

        // profiling status
        if (PROFILE_TASK_EXECUTION_SERVICE == null) {
            PROFILE_TASK_EXECUTION_SERVICE = ServiceManager.INSTANCE.findService(ProfileTaskExecutionService.class);
        }
        this.profileStatus = PROFILE_TASK_EXECUTION_SERVICE.addProfiling(
            this, segment.getTraceSegmentId(), firstOPName);

        this.correlationContext = new CorrelationContext();
        this.extensionContext = new ExtensionContext();
        this.spanLimitWatcher = spanLimitWatcher;
    }

    /**
     * 获取全局追踪身份traceId
     * @return the first global trace id.
     */
    @Override
    public String getReadablePrimaryTraceId() {
        // 获取分布式的追踪身份的id字段属性
        return getPrimaryTraceId().getId();
    }

    private DistributedTraceId getPrimaryTraceId() {
        // 获取追踪段相关的分布式的追踪身份
        return segment.getRelatedGlobalTrace();
    }

    @Override
    public String getSegmentId() {
        return segment.getTraceSegmentId();
    }

    @Override
    public int getSpanId() {
        return activeSpan().getSpanId();
    }

    // ...

}

DistributedTraceId

org.apache.skywalking.apm.agent.core.context.ids.DistributedTraceId#id

分布式的追踪身份DistributedTraceId,表示一个分布式调用链路。

package org.apache.skywalking.apm.agent.core.context.ids;

import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.RequiredArgsConstructor;
import lombok.ToString;

/**
 * The <code>DistributedTraceId</code> presents a distributed call chain.
 * 表示一个分布式调用链路。
 * <p>
 * This call chain has a unique (service) entrance,
 * <p>
 * such as: Service : http://www.skywalking.com/cust/query, all the remote, called behind this service, rest remote, db
 * executions, are using the same <code>DistributedTraceId</code> even in different JVM.
 * <p>
 * The <code>DistributedTraceId</code> contains only one string, and can NOT be reset, creating a new instance is the
 * only option.
 */
@RequiredArgsConstructor
@ToString
@EqualsAndHashCode
public abstract class DistributedTraceId {
    @Getter
    private final String id;
}

DistributedTraceId有两个子类PropagatedTraceIdNewDistributedTraceId

PropagatedTraceId

org.apache.skywalking.apm.agent.core.context.ids.PropagatedTraceId

传播的追踪身份PropagatedTraceId,表示从对等端传播的DistributedTraceId
在这里插入图片描述

package org.apache.skywalking.apm.agent.core.context.ids;

/**
 * The <code>PropagatedTraceId</code> represents a {@link DistributedTraceId}, which is propagated from the peer.
 */
public class PropagatedTraceId extends DistributedTraceId {
    public PropagatedTraceId(String id) {
        // 透传追踪身份traceId
        super(id);
    }
}

NewDistributedTraceId

org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId

新的分布式的追踪身份NewDistributedTraceId,是具有新生成的id的DistributedTraceId
默认构造函数调用GlobalIdGenerator.generate()生成新的全局id,即追踪身份traceId
在这里插入图片描述

package org.apache.skywalking.apm.agent.core.context.ids;

/**
 * The <code>NewDistributedTraceId</code> is a {@link DistributedTraceId} with a new generated id.
 */
public class NewDistributedTraceId extends DistributedTraceId {
    public NewDistributedTraceId() {
        // 生成新的全局id,即追踪身份traceId
        super(GlobalIdGenerator.generate());
    }
}

GlobalIdGenerator.generate()

org.apache.skywalking.apm.agent.core.context.ids.GlobalIdGenerator#generate

全局id生成器GlobalIdGenerator
本方法用于生成一个新的全局id,是真正生成追踪身份traceId的地方
在这里插入图片描述

package org.apache.skywalking.apm.agent.core.context.ids;

import java.util.UUID;

import org.apache.skywalking.apm.util.StringUtil;

public final class GlobalIdGenerator {
    // 应用实例进程身份id
    private static final String PROCESS_ID = UUID.randomUUID().toString().replaceAll("-", "");
    // 线程的id序列号的上下文
    private static final ThreadLocal<IDContext> THREAD_ID_SEQUENCE = ThreadLocal.withInitial(
        () -> new IDContext(System.currentTimeMillis(), (short) 0));

    private GlobalIdGenerator() {
    }

    /**
     * 生成一个新的id。
     * Generate a new id, combined by three parts.
     * <p>
     * The first one represents application instance id.
     * 第一部分,表示应用实例进程身份id
     * <p>
     * The second one represents thread id.
     * 第二部分,表示线程身份id
     * <p>
     * The third one also has two parts, 1) a timestamp, measured in milliseconds 2) a seq, in current thread, between
     * 0(included) and 9999(included)
     * 第三部分,也有两个部分, 1) 一个时间戳,单位是毫秒ms 2) 在当前线程中的一个序列号,位于[0,9999]之间
     *
     * @return unique id to represent a trace or segment
     * 表示追踪或追踪段的唯一id
     */
    public static String generate() {
        return StringUtil.join(
            '.',
            PROCESS_ID,
            String.valueOf(Thread.currentThread().getId()),
            String.valueOf(THREAD_ID_SEQUENCE.get().nextSeq())
        );
    }

    private static class IDContext {
        private long lastTimestamp;
        private short threadSeq;

        // Just for considering time-shift-back only.
        private long lastShiftTimestamp;
        private int lastShiftValue;

        private IDContext(long lastTimestamp, short threadSeq) {
            this.lastTimestamp = lastTimestamp;
            this.threadSeq = threadSeq;
        }

        private long nextSeq() {
            return timestamp() * 10000 + nextThreadSeq();
        }

        private long timestamp() {
            long currentTimeMillis = System.currentTimeMillis();

            if (currentTimeMillis < lastTimestamp) {
                // Just for considering time-shift-back by Ops or OS. @hanahmily 's suggestion.
                if (lastShiftTimestamp != currentTimeMillis) {
                    lastShiftValue++;
                    lastShiftTimestamp = currentTimeMillis;
                }
                return lastShiftValue;
            } else {
                lastTimestamp = currentTimeMillis;
                return lastTimestamp;
            }
        }

        private short nextThreadSeq() {
            if (threadSeq == 10000) {
                threadSeq = 0;
            }
            return threadSeq++;
        }
    }
}

案例实战

实践出真知识!!!
若不了解其底层实现原理,是很难想到这些切面的拦截点。
monitor/watch/trace 相关 - Arthas 命令列表
// 【切面的拦截点】生成新的追踪身份traceId + wanda_event开头的线程
stack org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'

watch org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '{target, returnObj}' '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")' -x 6

// 【切面的拦截点】获取全局追踪身份traceId + wanda_event开头的线程
stack org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getReadablePrimaryTraceId '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'

watch org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getPrimaryTraceId '{target, returnObj}' '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")' -x 6

【案例1】wanda事件线程的traceId是谁新生成的?

这些操作是否合理?

使用Arthas的stack命令,可以查看生成新的全局traceId的调用栈。
通过调用栈,traceId是由guava事件总线的订阅者Subscriber.invokeSubscriberMethod触发生成的。

在这里插入图片描述

[arthas@7]$ stack org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 432 ms, listenerId: 5
ts=2024-03-05 11:52:45;thread_name=wanda_event-thread-1;id=f6;is_daemon=false;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@8dfe921
    @org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId.<init>()
        at org.apache.skywalking.apm.agent.core.context.trace.TraceSegment.<init>(TraceSegment.java:74)
        at org.apache.skywalking.apm.agent.core.context.TracingContext.<init>(TracingContext.java:122)
        at org.apache.skywalking.apm.agent.core.context.ContextManagerExtendService.createTraceContext(ContextManagerExtendService.java:91)
        at org.apache.skywalking.apm.agent.core.context.ContextManager.getOrCreate(ContextManager.java:60)
        at org.apache.skywalking.apm.agent.core.context.ContextManager.createLocalSpan(ContextManager.java:123)
        // guava-eventbus-plugin
        // 调用方法拦截器
        at org.apache.skywalking.apm.plugin.guava.eventbus.EventBusSubscriberInterceptor.beforeMethod(EventBusSubscriberInterceptor.java:38)
        at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInterWithOverrideArgs.intercept(InstMethodsInterWithOverrideArgs.java:75)
        // 原生方法
        at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:-1)
        at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:145)
        at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:73)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

其是由apm-guava-eventbus-plugin插件的EventBusSubscriberInstrumentation操作改变字节码。
在这里插入图片描述

【案例2】在wanda事件线程追踪段中,查看在哪些地方获取traceId?

这些操作是否合理?

LeoaoJsonLayout.addCustomDataToJsonMap(LeoaoJsonLayout.java:29)方法中有调用TraceContext.traceId()

[arthas@7]$ stack org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getReadablePrimaryTraceId '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'
Press Q or Ctrl+C to abort.
Affect(class count: 2 , method count: 1) cost in 423 ms, listenerId: 3
ts=2024-03-04 21:03:59;thread_name=wanda_event-thread-1;id=140;is_daemon=false;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@67fe380b
    @org.apache.skywalking.apm.agent.core.context.TracingContext.getReadablePrimaryTraceId()
        at org.apache.skywalking.apm.agent.core.context.ContextManager.getGlobalTraceId(ContextManager.java:77)
        at org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor.beforeMethod(TraceIDInterceptor.java:35)
        at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.StaticMethodsInter.intercept(StaticMethodsInter.java:73)
        at org.apache.skywalking.apm.toolkit.trace.TraceContext.traceId(TraceContext.java:-1)
        // SkyWalking核心链路是上面👆🏻
        // 调用TraceContext.traceId()的地点
        at com.leoao.lpaas.logback.LeoaoJsonLayout.addCustomDataToJsonMap(LeoaoJsonLayout.java:29)
        at ch.qos.logback.contrib.json.classic.JsonLayout.toJsonMap(null:-1)
        at ch.qos.logback.contrib.json.classic.JsonLayout.toJsonMap(null:-1)
        at ch.qos.logback.contrib.json.JsonLayoutBase.doLayout(null:-1)
        at ch.qos.logback.core.encoder.LayoutWrappingEncoder.encode(LayoutWrappingEncoder.java:115)
        at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:230)
        at ch.qos.logback.core.rolling.RollingFileAppender.subAppend(RollingFileAppender.java:235)
        at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:102)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
        at ch.qos.logback.classic.Logger.filterAndLog_1(Logger.java:398)
        at ch.qos.logback.classic.Logger.info(Logger.java:583)
        // 输出打印日志
        // log.info("receive event persistUserPositionEvent=[{}]", event);
        at com.lefit.wanda.domain.event.listener.PersistUserPositionEventListener.change(PersistUserPositionEventListener.java:23)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.google.common.eventbus.Subscriber.invokeSubscriberMethod$original$ToNcZpNk(Subscriber.java:88)
        at com.google.common.eventbus.Subscriber.invokeSubscriberMethod$original$ToNcZpNk$accessor$utMvob4N(Subscriber.java:-1)
        at com.google.common.eventbus.Subscriber$auxiliary$8fYqzzq0.call(null:-1)
        at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInterWithOverrideArgs.intercept(InstMethodsInterWithOverrideArgs.java:85)
        at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:-1)
        at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:145)
        at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:73)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

在这里插入图片描述

【收获】
skywalking-agent启用的插件列表plugins/要有所取舍与衡量,组件开启的越多对链路追踪和拓扑的越复杂,影响面越大,未知不可控的因素也会增多。

参考引用


祝大家玩得开心!ˇˍˇ

简放,杭州


简放视野
18 声望0 粉丝

Microservices, Cloud Native, Service Mesh. Java, Go.