The LOD expression is very commonly used in the field of data analysis, and its full name is Level Of Detail, which is the level of detail.

intensive reading

What is the level of detail and why is LOD needed? You must have this question, let us answer it step by step.

What is the level of detail

You can try to ask questions like this: How detailed is your data?

The answer may be:

  1. The data is aggregated, sorry I can’t see the details, but if you just want to look at the total sales, here is a summary for you. .
  2. detailed? This is directly the original table data, 3 billion pieces, is this detailed enough? If you feel that it is not detailed enough, you have to split the business process and re-bury the points.

The higher the level of detail, the greater the amount of data, the lower the level of detail, the less the data, and the more summarized data.

It is difficult for humans to see valuable information in the highly detailed 3 billion records, so the process of data analysis can also be seen as the process of the data, behind which the level of data detail is gradually decreasing .

Level of detail of BI tools

If there is no LOD expression, the level of detail of a BI query is completely fixed:

  • If the table is dragged into the measure and there is no dimension, that is the highest level of detail, because only one record will be summarized in the end.
  • If the line chart is dragged into the dimension, the result is to aggregate the measures separately according to this dimension, the data is more detailed, and the detailed granularity is the current dimension, such as date.

If we want more detailed data, we need to drag more fields in the dimension until we reach the granularity of the most detailed schedule level. However, the same query cannot contain different detailed granularities, because the detailed granularity is determined by the combination of dimensions and cannot be changed, such as the example in the following table:

行:国家 省 城市
列:GDP

In this example, the detailed level is limited to the city level summary, and the more fine-grained data under the city will not be seen. Each piece of data is at the granularity of the city. It is impossible for us to display the country-aggregated GDP in the query results. Or if you see more detailed and granular monthly GDP information, it is even more impossible to calculate the city-granular GDP and the national-granular GDP together to calculate the proportion of the city's GDP in the country.

However, there are many requirements similar to the above example, and it is very common. BI tools must come up with a solution, so LOD is an expression that allows us to describe different detailed granularities in a query.

Look at the level of detail from the expression calculation

Expression calculation must be limited to the same detailed granularity. This is an iron law. Why?

Imagine the following two tables with different detailed granularities:

total sales:

10000

sales in each city:

北京 3000
上海 7000

If we want to calculate the proportion of contribution in each city's sales, then we have to write the [sales in each city] / [total sales], but obviously this is impossible, because the former has two Data, the latter has only one piece of data and cannot be calculated at all.

What we can do must be the same number of data rows, so whether it is IF ELSE, CASE WHEN, or addition, subtraction, multiplication, and division can all be carried out according to the row granularity.

LOD gives us the ability to calculate across detailed granularities. Its essence is to unify the detailed granularity of data, but we can make a column of data come from a completely different level of detail calculation:

城市 销售额 总销售额
北京 3000  10000
上海 7000  10000

As shown in the figure, LOD can process the data into such a way that although the total sales and the detailed granularity of the city are different, they are still added to the end of each row, so that the calculation can be performed.

Therefore, the LOD can be calculated at any level of detail, and the final output "fits" to the level of detail of the current query.

LOD expressions are divided into three abilities, namely FIXED, INCLUDE, and EXCLUDE.

FIXED

{ fixed [省份] : sum([GDP]) }

According to the fixed detailed granularity of the city, the DGP of each province is calculated and finally merged into the current detailed granularity.

If the current query granularity is province and city, the logic of adding the LOD field is shown in the figure below:

It can be seen that the essence is the result of sum after two different SQL queries. The internal 0617f4a34cde03 represents the aggregation method in the FIXED expression, and the external sum represents how to aggregate if the FIXED detail level is lower than the current view detail level. In this example, the FIXED level of detail is higher, so sum does not work. avg to 0617f4a34cde0a has the same effect, because the merged detail level is a one-to-many relationship, and only the many-to-one relationship needs to be aggregated when merging.

The outermost aggregation method generally plays a role in INCLUDE expressions.

EXCLUDE

{ exclude [城市] : sum([GDP]) }

In the current query granularity, the GDP is calculated after excluding the granularity of the city, and finally merged into the current detailed granularity.

If the current query granularity is province, city, and season, the logic of adding the LOD field is shown in the following figure:

As shown in the figure, on the basis of the current view detail level, EXCLUDE excludes some dimensions, and the resulting level of detail will definitely be higher.

INCLUDE

{ include [城乡] : avg([GDP]) }

In the current query granularity, the urban and rural granularity is added to calculate the GDP, and finally merged into the current detailed granularity.

This type of example is more difficult to understand, and sum , because the calculation results will not be different, and it must avg . Let's look at it in combination with the following figure:

This is the problem of inaccurate calculation of avg, that is, the average value calculated by different detailed levels is different, but sum, count, etc. will not affect the calculation result as the detailed level changes, so when it comes to avg calculation, it can be expressed by INCLUDE The formula specifies the detailed level of calculation to ensure the accuracy of the data caliber.

How to use the LOD field

In addition to the above example, which is directly found and displayed to users, the LOD field is more commonly used as an intermediate calculation process, such as calculating the proportion of provincial GDP in the country. Because LOD has merged the calculation results of different detailed granularities into the current detailed granularity, the calculation expression is as follows:

sum([GDP]) / sum({ fixed [国家] : sum([GDP]) })

It seems to be calculated across detailed granularities, but in fact there is no. The actual calculation is still calculated line by line. The following LOD expressions are only logically calculated according to the specified detailed granularity, but will eventually remain consistent with the current view detailed granularity . Therefore, it can participate in the calculation.

We will continue to interpret the Top 15 LOD expression business scenarios compiled by tableau to gain a deeper understanding of LOD expressions.

Summarize

LOD expressions allow you to easily create calculated fields "out of" the level of detail of the current view.

You may be wondering, why not actively change the current view detail level to achieve the same effect? For example, add or reduce a dimension.

The reason is that LOD is often used for calculations across detailed levels, such as calculating the proportion of parts to the overall, calculating whether the current record is the user's first order, etc. More scenarios will be interpreted in the next intensive reading.

The discussion address is: Intensive Reading "What is LOD Expression" · Issue #365 · dt-fe/weekly

If you want to participate in the discussion, please click here , there are new topics every week, weekend or Monday release. Front-end intensive reading-to help you filter reliable content.

Follow front-end intensive reading WeChat public

<img width=200 src="https://img.alicdn.com/tfs/TB165W0MCzqK1RjSZFLXXcn2XXa-258-258.jpg">

Copyright statement: Freely reprinted-non-commercial-non-derivative-keep the signature ( Creative Commons 3.0 License )

黄子毅
7k 声望9.5k 粉丝