[Note] This article is translated from: [Saving Time with Structured Logging-Reflectoring](
https://reflectoring.io/structured-logging/)
Logging is the ultimate resource for investigating events and understanding what is happening in the application. Every application has some type of log.
However, these logs are often messy, and analyzing them requires a lot of effort. In this article, we will study how to use structured logs to greatly increase the value of logs .
We will use some very practical techniques to increase the value of application log data, and use Logz.io as a logging platform to query logs.
Code example
This article is attached [GitHub on](
https://github.com/thombergs/code-examples/tree/master/logging/structured-logging) working code example.
What is a structured log?
"Normal" logs are unstructured. They usually contain a message string:
2021-08-08 18:04:14.721 INFO 12402 --- [ main] i.r.s.StructuredLoggingApplication : Started StructuredLoggingApplication in 0.395 seconds (JVM running for 0.552)
This message contains all the information we hope to obtain when investigating an incident or analyzing a problem:
- Date of log event
- The name of the logger that created the log event, and
- The log message itself.
All the information is in the log message, but it is difficult to query this information! Since all the information is in one string, if we want to get specific information from the log, we must parse and search this string.
For example, if we only want to view the logs of a specific logger, the log server must parse all log messages, check whether they have a specific pattern to identify the logger, and then filter the log messages based on the logger required.
The structured log contains the same information, but in a structured form instead of an unstructured string. Generally, structured logs are presented in JSON format:
{
"timestamp": "2021-08-08 18:04:14.721",
"level": "INFO",
"logger": "io.reflectoring....StructuredLoggingApplication",
"thread": "main",
"message": "Started StructuredLoggingApplication ..."
}
This JSON structure allows the log server to efficiently store and, more importantly, retrieve logs.
For example, you can now timestamp
or logger
, and search is more effective than parsing strings of specific patterns.
But the value of structured logs does not there: 161931ebc90a15 We can add any custom fields to structured log events as needed! We can add contextual information to help us identify the problem, or we can add metrics to the log.
With all the data we now have at our fingertips, we can create powerful log queries and dashboards so that we can find the information we need even if we just wake up in the middle of the night to investigate the incident.
Let us now look at a few use cases that demonstrate the power of structured logging.
Add code path for all log events
The first thing we need to look at is the code path. Each application usually has several different paths, and incoming requests can pass through the application. Consider this diagram:
Use structured logs to save time in Java Spring Boot projects
This example has (at least) three different code paths, which can be taken by incoming requests:
- user code path : The user is using the application from their browser. The browser sends a request to the Web controller, and the controller calls the domain code.
- Third-party system code path : The HTTP API of the application is also called from the third-party system. In this example, the third-party system calls the same web controller as the user's browser.
- timer code path : Like many applications, this application has some scheduled tasks triggered by timers.
Each of these code paths can have different characteristics. Domain services involve all three code paths. During events involving domain service errors, understanding the code path that caused the error will be of great help!
If we do not know the code path, we can easily make fruitless guesses during the investigation of the incident.
So, we should add the code path to the log! Here is how we can do this with Spring Boot.
Add code path for incoming web request
In Java, the SLF4J log library provides the MDC class (message diagnosis context). This class allows us to add custom fields to all log events issued in the same thread.
To add custom fields for each incoming web request, we need to build an interceptor to add the codePath
field at the beginning of each request, even before our web controller code is executed.
We can do this by implementing the HandlerInterceptor
public class LoggingInterceptor implements HandlerInterceptor {
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler)
throws Exception {
if (request.getHeader("X-CUSTOM-HEADER") != null) {
MDC.put("codePath", "3rdParty");
} else {
MDC.put("codePath", "user");
}
return true;
}
@Override
public void postHandle(HttpServletRequest request, HttpServletResponse response, Object handler,
ModelAndView modelAndView) {
MDC.remove("codePath");
}
}
In the preHandle() method, we call MDC.put() to add the codePath field to all log events. If the request contains a header identifying that the request came from a third-party system, we set the code path to 3rdParty, otherwise, we assume that the request comes from the user’s browser.
Depending on the application, the logic here may be very different, of course, this is just an example.
In the postHandle()
method, we should not forget to call MDC.remove()
again to delete all previously set fields, otherwise the thread will retain these fields, even if it returns to the thread pool, and the next request to be served by the thread may still be set to those fields Wrong value.
To activate the interceptor, we need to add it to InterceptorRegistry
:
@Componentpublic
class WebConfigurer implements WebMvcConfigurer {
@Override
public void addInterceptors(InterceptorRegistry registry) {
registry.addInterceptor(new LoggingInterceptor());
}
}
That's it. All log events issued in the thread that passed in the log event now have the field codePath
If any request creates and starts a child thread, make sure to call MDC.put()
beginning of the new thread's life cycle.
Add the code path in the scheduled job
In Spring Boot, we can easily create scheduled jobs @Scheduled
and @EnableScheduling
To add the code path to the log, we need to make sure to call MDC.put()
as the first thing in the scheduling method:
@Componentpublic
class Timer {
private final DomainService domainService;
private static final Logger logger = LoggerFactory.getLogger(Timer.class);
public Timer(DomainService domainService) {
this.domainService = domainService;
}
@Scheduled(fixedDelay = 5000)
void scheduledHello() {
MDC.put("codePath", "timer");
logger.info("log event from timer");
// do some actual work
MDC.remove("codePath");
}
}
In this way, all log events issued from the thread executing the scheduling method will contain the field codePath
. We can also create our own @Job
annotations or similar annotations to complete this work for us, but this is beyond the scope of this article.
To make the logs of scheduled jobs more valuable, we can add other fields:
job_status
: A status indicating whether the job is successful.job_id
: The ID of the executed job.job_records_processed
: If the job performs some batch processing, it can record the number of records processed.- ......
Through these fields in the log, we can get a lot of useful information on the log server!
Add user ID to user-initiated log event
Most of the work in a typical web application is done in web requests from the user's browser, which trigger threads in the application to create a response for the browser.
Imagine that some error occurred, and the stack trace in the log shows that it is related to a specific user configuration. But we don't know which user the request comes from!
In order to alleviate this situation, contain some sort of user ID in all log events triggered by the user is very helpful .
Since we know that most of the incoming web requests come directly from the user’s browser, we can add the username
LoggingInterceptor
created to add the codePath
field:
public class LoggingInterceptor implements HandlerInterceptor {
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler)
throws Exception {
Object principal = SecurityContextHolder.getContext().getAuthentication().getPrincipal();
if (principal instanceof UserDetails) {
String username = ((UserDetails) principal).getUsername();
MDC.put("username", username);
} else {
String username = principal.toString();
MDC.put("username", username);
}
return true;
}
@Override
public void postHandle(HttpServletRequest request, HttpServletResponse response, Object handler,
ModelAndView modelAndView) {
MDC.remove("username");
}
}
This code assumes that we use Spring Security to manage access to the web application. We use SecurityContextHolder
to get Principal
and extract the username from it to pass it to MDC.put()
.
Every log event issued from the thread of the service request will now contain a username field and username.
With this field, we can now filter logs requested by specific users. If a user reports a problem, we can filter the logs based on their name and greatly reduce the logs we have to check.
According to regulations, you may want to record a more opaque user ID instead of a user name.
Add root cause for error log events
When an error occurs in our application, we usually record a stack trace. The stack trace helps us determine the root cause of the error. Without the stack trace, we would not know which code caused the error!
However, if we want to run error statistics in the application, the stack trace is very clumsy. Suppose we want to know how many errors our application logs every day, and how many of them are caused by which root cause anomaly. We have to export all stack traces from the log and perform some manual filtering on them to get the answer to this question!
However, if we add the custom field rootCause
to each error log event, we can filter the log events by this field, and then create a histogram or pie chart of different root causes in the UI of the log server, without even exporting the data.
One way to do this in Spring Boot is to create a @ExceptionHandle
:
@ControllerAdvicepublic
class WebExceptionHandler {
private static final Logger logger = LoggerFactory.getLogger(WebExceptionHandler.class);
@ExceptionHandler(Exception.class)
@ResponseStatus(HttpStatus.INTERNAL_SERVER_ERROR)
public void internalServerError(Exception e) {
MDC.put("rootCause", getRootCause(e).getClass().getName());
logger.error("returning 500 (internal server error).", e);
MDC.remove("rootCause");
}
private Throwable getRootCause(Exception e) {
Throwable rootCause = e;
while (e.getCause() != null && rootCause.getCause() != rootCause) {
rootCause = e.getCause();
}
return rootCause;
}
}
We created a @ControllerAdvice
, which means it is valid in all our web controllers.
In the class, we created a method annotated @ExceptionHandler
This method will be called for any exception in the web controller. It rootCause
MDC field to the fully qualified name of the exception class that caused the error, and then records the stack trace of the exception.
That's it. All log events that print stack traces now have a field rootCause
, through which we can filter to understand the distribution of errors in our application.
Add tracking ID to all log events
If we run multiple services, such as in a microservice environment, things will quickly become complicated when analyzing errors. One service calls another service, another service calls another service, and it is difficult (if possible) to track errors in one service to errors in another service.
Tracking ID helps to connect log events in one service with log events in another service:
In the example diagram above, service 1 is called and generates the tracking ID "1234". Then it calls services 2 and 3 to propagate the same tracking ID to them so that they can add the same tracking ID to their log events, so that they can connect to the log events of all services by searching for a specific tracking ID.
For each outgoing request, Service 1 also creates a unique "span ID". Although the tracking spans the entire request/response cycle of service 1, the span only spans the request/response cycle between one service and another.
We can achieve this own tracking mechanism, but there are some tools you can use to track the standard and integrated into the standard tracking system, such as Logz.io distributed tracking .
Let's use standard tools. In the Spring Boot world, this is Spring Cloud Sleuth , and we can integrate this feature into our application by simply adding it to our pom.xml
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>2020.0.3</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement><dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
</dependencies>
This will automatically add tracking and span IDs to our logs and propagate them from one service to the next via request headers when using a supported HTTP client. You can read more about Spring Cloud Sleuth in the article uses Spring Cloud Sleuth for tracing in distributed systems
Add the duration of certain code paths
The total duration required for our application to respond to a request is an important metric. If the speed is too slow, the user will feel frustrated.
Generally, it’s a good idea to expose the request duration as an indicator and create a histogram and percentile dashboard showing the request duration, so that we can understand the health of the application at a glance, and may even be violating a certain Receive an alert when the threshold is reached.
However, we are not looking at the dashboard all the time, we may be interested not only in the total request duration, but also in the duration of certain code paths. When analyzing logs to investigate problems, knowing the time required to execute a particular path in the code can be an important clue.
In Java, we might do this:
void callThirdPartyService() throws InterruptedException {
logger.info("log event from the domain service");
Instant start=Instant.now();
Thread.sleep(2000); // simulating an expensive operation
Duration duration=Duration.between(start,Instant.now());
MDC.put("thirdPartyCallDuration",String.valueOf(duration.getNano()));
logger.info("call to third-party service successful!");
MDC.remove("thirdPartyCallDuration");
}
Suppose we are calling a third-party service and want to add the duration to the log. Using Instant.now()
and Duration.between()
, we calculate the duration, add it to the MDC, and then create a log event.
This log event will now contain the field thirdPartyCallDuration
, which we can filter and search in the log. For example, we might search for instances where this call takes too long. Then, we can use the user ID or tracking ID, and when this takes a particularly long time, we can also use them as the log event field to find out the pattern.
Query structured logs in Logz.io
If we set the log record to Logz.io per-environment logging , we can now query the log in the Kibana UI provided by Logz.io.
Error distribution
For example, we can query all log events that have a value in the rootCause field:
__exists__: "rootCause"
This will display a list of error events with root causes.
We can also create a visualization in the Logz.io UI to show the error distribution within a given time frame:
This chart shows that almost half of the errors are caused by ThingyException
, so it might be a good idea to check if this exception can be avoided in some way. If it cannot be avoided, we should record it on WARN
instead of ERROR
to keep the error log clean.
Error distribution across code paths
For example, suppose a user complains that a scheduled job is not working properly. If we add a job_status
field to the scheduling method code, we can filter the log by those failed jobs:
job_status: "ERROR"
To get a more advanced view, we can create another pie chart visualization that shows the distribution of job_status
and rootCause
We can now see that most of the scheduled assignments have failed! We should add some alerts for this! We can also see which anomalies are the root cause of most planned jobs and start investigating.
Check user's errors
Or, suppose a user with the username "user" makes a support request, specifying the approximate date and time that it occurred. We can use the query username: user
filter the logs to only display the user's logs, and quickly zero the cause of the user's problem.
We can also expand the query to show only rootCause
to directly understand when and what went wrong.
username: "user" AND _exists_: "rootCause"
Structure your log
This article only shows a few examples to illustrate how we can add structure to log events and use that structure when querying the log. Anything that can be searched in the log in the future should be a custom field in the log event. The fields added to the log event depend to a large extent on the application we are building, so when writing code, you must consider what information can help you analyze the log.
You can go to [GitHub on](
https://github.com/thombergs/code-examples/tree/master/logging/structured-logging) to find the code examples discussed in this article.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。