From "Shannon Entropy" to "Alarm Noise Reduction", how to improve alarm accuracy?

Author: Dong Shandong & Bai Yu

For most people, information is a very abstract concept. People often say that there is a lot of information or less information, but it is difficult to say exactly how much information there is. For example, how much information is in a help document or an article. It was not until 1948 that CEShannon (Shannon) put forward the concept of "information entropy", which solved the problem of quantitative measurement of information. The term information entropy was borrowed by Shannon from thermodynamics. The thermal entropy in thermodynamics is a physical quantity that expresses the degree of disorder of the molecular state. Shannon uses the concept of information entropy to describe the uncertainty of the information source.

Shannon's information entropy is essentially a mathematical measure of the "uncertain phenomenon" that we are used to. For example, if the weather forecast says "the probability of raining this afternoon is 60%", we would like to go out with an umbrella; if the forecast says "there is a 60% chance of rain", we will hesitate to bring an umbrella. , Because umbrellas are really a burden when they are useless. Obviously, in the first weather forecast, the degree of uncertainty about rain is relatively small, while the second one is more uncertain about rain.

As a rather abstract concept in mathematics, we can understand information entropy as the probability of occurrence of certain specific information. The information entropy and thermodynamic entropy are closely related. According to Charles H. Bennett's reinterpretation of Maxwell's Demon, the destruction of information is an irreversible process, so destroying information conforms to the second law of thermodynamics. The production of information is the process of introducing negative (thermodynamic) entropy into the system. When a kind of information has a higher probability of occurrence, it indicates that it has been spread more widely, or the degree of citation is higher. We can think that from the perspective of information dissemination, information entropy can represent the value of information, so that we have a standard to measure the value of information.

More specifically, in our daily operation and maintenance work scenarios, various alarm events are the most typical type of information. How to evaluate the value of alarm information in the face of massive daily alarm events has become an important issue.

Major monitoring platforms/tools generally have two ways to identify abnormal indicators and trigger alarm events. The first is a common way of setting thresholds/dynamic thresholds. The second is to set default rules to trigger system preset rule events, such as machine restart. At the same time, the operation and maintenance team often does not rely on a single monitoring tool, and often needs to set corresponding monitoring alarms in various tools at different levels.

In this context, the diversification of monitoring sources and the diversification of monitoring tool categories often leads to the same failure cause under different monitoring tools and different monitoring rules, triggering a large number of repeated and redundant alarm events. Even in the event of a large-scale failure, an alarm storm is formed. It is difficult for operation and maintenance personnel to quickly and effectively identify which alarm events are important and accurate information from these massive alarms, which often leads to the overwhelming of effective alarms. Therefore, for the operation and maintenance team and alarm products, there are the following pain points:

Multiple monitoring alarm sources and frequent false alarms lead to a large number of repetitive, redundant, and inefficient events. Important events are submerged in them and cannot be effectively identified;
Alarm storm caused by a wide range of failures;
Dirty data such as test events are mixed in the event.

What is ARMS intelligent noise reduction

The ARMS intelligent noise reduction function relies on the NLP algorithm and information entropy theory to build models, and mines the pattern of these events from a large number of historical alarm events. When a real-time event is triggered, each event is labeled in real time with information entropy value and noise recognition to help users quickly identify the importance of the event.

Introduction to the realization principle of intelligent noise reduction

A large number of historical events are deposited in the event center, and it is difficult to artificially abstract the event mode and value from these large amounts of historical events. Apply real-time monitoring service ARMS ITSM product intelligent noise reduction function to collect different alarm sources to a unified platform for alarm event processing, pattern recognition of these historical events, mining internal correlations, and establishing a machine learning model based on information entropy to assist users in important events The core steps of the model include:

Step 1: Based on natural language processing and domain vocabulary, complete the word vectorization of the event content to achieve the smallest granularity measurement of the event;
Step 2: Based on the concept of information entropy in information theory, combined with the tfidf model, construct the information entropy value and importance measurement model of the word vector;
Step 3: Use sigmod to complete the non-linear and normalized "information entropy" measurement of the event;
Step 4: Combine the processing records and feedback of historical events to construct the model for iterative training and verification.

Use natural language processing algorithms to characterize the importance of events based on the concept of information volume and information entropy in information theory, helping users to use a large number of historical events to train iteratively to identify the importance of events model. When a new real-time event is triggered, quickly identify the importance of the event. At the same time, the information entropy threshold setting is combined to complete the filtering and shielding of noise events. And according to time evolution and event type and content changes, the model can be updated iteratively (update frequency is once a week) through self-adaptation, and the accuracy of the model can be guaranteed without any operation by the user.

Intelligent noise reduction business value

Business value 1: Intelligently identify repetitive and inefficient events and discover novel events

(1) Recognition of a large number of repetitive and similar events

For a large number of repetitive and similar events, such events continue to appear in large numbers in event alarms. The model will continue to give reduced information entropy to the information entropy value of such events, that is, the information entropy value of such events will become lower and lower. Until it is close to 0 at the end. This is because the model expects that users can pay more attention to response to important events, and if the event is repeated and triggered in large numbers, it often means that users do not care about such events at all, and it also supports the model mechanism from the business logic.

(2) Discover novel events

For events that have never occurred in historical events and are relatively rare, the model will focus on them, identify such events as novel events, and give the current events a larger value of information entropy, in order to expect users to pay more attention to such events. Therefore, the ARMS intelligent noise reduction model also has the function to help users identify important events.

Business value 2: Customized demand support setting

For some user test events or specific field events, we often want to customize this type of event. For example, the test event only triggers the viewing of the entire process, but there is no need to click to do any processing. For another example, some events contain particularly important field information, and this type of event needs to be processed first.

Business value 3: The model has high growth

For users with a small number of historical events (number of events <1000), it is generally not recommended to turn on this function. This is because when the number of historical events is too small, it is difficult to fully train the model to recognize its internal patterns and laws. But after it is turned on, the model will be iteratively trained on the basis of new events this week. On the premise that the user does not need to care, the model adaptively tracks the changes of event patterns on the one hand, and on the other hand, it continues to iterate sufficiently for models with insufficient number of original events.

Best Practices

Use process description

step 0: entrance

step 1: turn on

When you feel that there are too many events, repeated events, and too many inefficient/invalid events, you can choose to turn on intelligent noise reduction.

step 2: Use

After it is turned on, one month of historical event data will be pulled (if there are too many events in one month, part of it will be pulled for training at present) for smart model training. Click Smart Noise Reduction to enter the details page.

step 3: Parameter setting

After understanding this function in depth, users can start to consider setting some key to prioritize and block events. For details of priority words and blocked words, please refer to the noun explanation.

Glossary

Noise event threshold: After turns on smart noise reduction, we will calculate the information entropy value for each new event. The noise event threshold setting is the dividing line between noise and non-noise events.
noise event: events whose information entropy is lower than the set information entropy threshold are collectively referred to as noise events.
non-noise events: event is greater than or equal to the set information entropy threshold are collectively referred to as non-noise events.
priority words: In the keyword setting, users can set some words that they want to see first, such as: important, critical, etc. When the event name and event content of an occurrence include the set priority words, the priority of the current event is correspondingly increased to avoid being recognized as a noise event.
Blocked words: In the keyword setting, users can set some words that they think are not important, such as: test, test, etc. When the event name and content of the event contain the set shielding words, the current event will be directly identified as an information entropy of 0 (if the information entropy threshold is set> 0, it will be identified as a noise event).
Common words Top50: According to the statistical learning of historical events, the model saves a word frequency table of the event vocabulary. For common words, the word frequency table is sorted according to the frequency of appearance, and Top50 is selected for display.

common problem

When to turn on this feature

For users whose number of historical events is> 1000, ARMS intelligent noise reduction will be automatically turned on.

For users who still have a small number of historical events, users can open it on their own, but the model effect needs a period of time to iteratively tune.

Need to modify model parameters

It is recommended to use it in the initial stage, without modification, just take the default.

After understanding the functions, you can try to set priority words and blocked words, as well as information entropy thresholds, to achieve more customized requirements.

Click here , went to Ali cloud observable topic page for more information!

Recently Popular

Series Open Class#

From "Shannon Entropy" to "Alarm Noise Reduction", how to improve alarm accuracy?

What is ARMS intelligent noise reduction

Introduction to the realization principle of intelligent noise reduction

Intelligent noise reduction business value

Business value 1: Intelligently identify repetitive and inefficient events and discover novel events

(1) Recognition of a large number of repetitive and similar events

(2) Discover novel events

Business value 2: Customized demand support setting

Business value 3: The model has high growth

Best Practices

Use process description

step 0: entrance

Glossary

common problem

When to turn on this feature

Need to modify model parameters

Recently Popular

See you in the live broadcast room!

阿里云云原生

引用和评论

通义灵码 AI IDE 上线，第一时间测评体验

支付宝H5下载被拦截的原因排查与解决指南

【独立开发作品】SlideBrowser 一个轻量的滑动浏览器，给你不一样的交互体验

JManus - 面向 Java 开发者的开源通用智能体

MCP协议重大升级，Spring AI Alibaba联合Higress发布业界首个Streamable HTTP实现方案

PAI Model Gallery 支持云上一键部署 Qwen3 全尺寸模型

2025年3月中国数据库排行榜：PolarDB夺魁傲群雄，GoldenDB晋位入三强