1

Through the study of the "What is a LOD expression" , you have already understood what a LOD expression is. In order to consolidate understanding, combined scenario review is the most effective method, so this time we combined Top 15 LOD Expressions learn the 15 application scenarios of LOD expressions. Due to space limitations, this article introduces 1 to 8 scenarios.

1. Frequency of customer orders

the number of customers who placed each order?

Histogram Y-axis is clearly count([customerID]) , due to the statistical total number of customers in the current dimension .

Here is a sentence. For the Y axis of the histogram, in sql, it is group by , so the Y axis is the summary of the X axis.

The X-axis of the histogram should express the granularity of disassembly, such as whether we look at the data of each city or the data of each province. Is no exception in this scenario, we have to look data under each order number , then how many conversions as a single dimension of it?

We need to use the FIX expression to create a dimension field to indicate the number of orders placed by each customer. Obviously the database does not have this dimension, and this dimension needs to be aggregated according to the customer ID group by and then according to the order ID count, so it can be described { fixed [customerID] : count([orderId]) }

2. Array analysis

When we look at the annual customer sales, even if it is increasing year by year, we will have a question: annual sales volume of 161887dae087d7, how much did the first customer contribute in each year?

Because it is related to the loyalty of old customers and the speed of new customers, it is not good for the gap between new and old customers to be too large. Then how do we make the bar chart of 2021 be stratified according to the first orders of customers in 2019, 2020, and 2021? This is array analysis.

We are going to draw a histogram, the X and Y axes are [Year] and sum([Sales]) respectively.

In order to layer the histogram, we need a dimension field that represents the color legend. For example, if we drag in the existing gender dimension, each column will be divided into male and female. But the question is, do we make the "first year dimension" that does not exist?

The answer is to use the FIX expression: { fixed [customerID] : min([orderDate]) } .

3. Daily Profit Index

Analysis annual profit each month, the number of days the loss distribution . As shown below:

The column is drill down from year to month, which is better to implement. Just drag in the field [year] and drill down to the monthly granularity, and remove the quarter granularity.

The row is a perspective view of "High Yield", "Positive Yield", and "Loss", and the value is the number of days in the current month.

So how to calculate the high-yield and loss status? Because the final particle size is the day, so we Yaoan day basis, we must first get the sum of the profits every day, these intermediate processes can be accomplished using the LOD fields, the creation of a day profits field (profitPerDay) : { fixed [orderDate] : sum([profit]) } .

Since we are not sensitive to total profits, just we want to split into three phases, so the use of IF THEN generate a new field day profit targets (dailyProfitKPI) : IF [profitPerDay] > 2000 THEN "Highly Profitable" ELSEIF [profitPerDay] <= 0 THEN "unprofitable" ELSE "profitable" END .

Therefore, the [dailyProfitKPI] indicator created is a dimension, that is, if the daily profit summary of the current row is greater than 2000, the value is "Highly Profitable". Therefore, the row dragged count(distinct [orderDate]) , the [dailyProfitKPI] color can drag a perspective row.

4. Percentage of total

A major feature of the LOD expression is to calculate the proportion of cross-detail levels. For example, we want to look at European countries accounted for in the world:

Obviously the sum of all the countries in this graph is not 100%, because Europe adds up to less than 20%. However, at the current detailed level, the total global sales volume is not available, so we can use the FIX expression To achieve: sum([sales]) / max({ sum([sales]) }) .

Two points are explained here:

  1. The reason for using max is because the LOD expression is just a field, and there is no aggregation method. The calculation must be performed at the same level of detail. Since the total sales volume is only one piece of data, we use max or min or even sum . The results are the same. .
  2. If you do not add dimensional restrictions, you can omit the "fix" statement, so { sum([sales]) } actually a FIX expression, which means { fixed : sum([sales]) } .

5. New customer growth trend

Looking at the annual customer growth trend graph, have you ever thought that this trend graph must always be upward? In other words, looking at the trend graph going upward does not necessarily mean that the business is doing well.

If the company develops better every year than last year, the number of new customers added each year should be more than last year. Therefore, the annual new customer growth trend graph is more meaningful. If you see the trend of this trend graph is upward, it means that every year There are more new customers than last year, indicating that the company has got rid of inertia and has achieved new growth every year.

So we have to add a filter condition. Add a new dimension field. When this single customer is a new customer this year, it will be true, otherwise it will be false, so that when we filter, we only look at the result of this field being true.

So how does this field come from? The idea is to get the customer's first order year. If the first order year is the same as the current order year, the value is true, otherwise it is false.

We use LOD to create the first single year field [firstOrderDate] : { fixed [customerId] : min([orderDate]) } , and then create the filter field [newOrExist] : IFF([firstOrderDate] = [orderDate], 'true', 'false') .

6. Comparative analysis of sales

As shown in the bar chart below, the right side is the comparison data of each item according to the selected classification:

Calculation of the ratio is to use minus the currently selected classification of the current sales volume . I believe you can guess, but the sales volume of the previous category has nothing to do with the current view detail level, only the Category selected by the user.

If we already have a metric field-select category sales selectedSales , we should exclude the interference of the current category dimension, so we can use EXCLUDE expressions to describe selectedCategorySales : { exclude [category] : sum([selectedSales]) } .

The next step is to create the selectedSales field. The background knowledge is that [parameters].[category] can obtain the currently selected dimension value. Then we can write an IF expression to aggregate sales when the dimension is equal to the selected dimension. Isn’t that the selected sales volume? So the formula is: IF [category] = [parameters].[category] THEN sales ELSE 0 END .

Finally, to compare the differences, just create a [diff] field, and the expression is sum(sales) - sum(selectedCategorySales) .

7. Average maximum transaction volume

As shown in the figure below, the current level of detail is country, but we want to show the average maximum transaction volume of each country:

Obviously, to require the average maximum transaction volume, first calculate the maximum transaction volume of each sales representative. Since this level of detail is lower than that of the country, we can use the INCLUDE expression to calculate the maximum sales representative transaction volume largestSalesByRep : { include [salesRep] : max([sales]) } and ask for this metric field Just average.

It can be seen from this example that if we are at a higher level of detail, such as a country, the sum([sales]) at this time is summarized according to the country level of detail, and the sales representative’s level of detail is ignored. But if you want to display the average maximum transaction value of each country, you must ask for max([sales]) at the detailed level of the sales representative. Because it is for each country, we don’t use { fixed [salesRep] } , but { include [salesRep] } , so the final calculated level of detail is: [country],[salesRep] , so In order to calculate the maximum transaction value of sales in each country (because maybe some sales are sold in different countries at the same time).

8. Reality and goals

In the sixth example-sales comparison analysis, we can see the comparison of the absolute value of the sales. This time, we need to calculate the percentage difference between the actual sales and the target:

As shown in the above figure, the upper left corner shows the difference between the actual and the target; the upper right corner shows the target completion rate of each region's product; the lower half shows the histogram of the actual sales of each product, and the target value is marked with a black horizontal line.

The upper left corner is very simple, [diffActualTraget] : [profit] - [targetProfit] , just subtract the current profit from the target profit.

The upper right corner needs to be disassembled in several steps. Our ultimate goal is to calculate the target completion rate of each region's product. Obviously, the formula is the current number of completed products/total number of products. The total number of products is relatively simple. Under the existing regional dimensions, the total number of products can be calculated, which is count(distinct [product]) ; the difficulty is the current number of completed products. Here we need to use INCLUDE again. Why? Because the regional granularity is higher than the product granularity, when we look at the regional summary, we don’t know the completion of each product, so we must calculate the profit target difference in the INCLUDE product dimension, the formula is [diffProductActualTraget] : { include [product] : sum(diffActualTraget) } , and then when this value is greater than 0, it is considered complete Once the goal is reached, we can create another field, which is the number of completed goals. If the goal is reached, it will be 1, otherwise it will be 0. This is convenient for finding the "current completed product number": aboveTargetProductCount : IFF([diffProductActualTraget] > 0, 1, 0) , then the current completed product number is sum([diffProductActualTraget]) , so the product goal The completion rate is sum([diffProductActualTraget]) / count(distinct [product]) . Drag this field into the indicator and format it as a percentage to get the result.

Summarize

Through the above example, we can summarize several methods of use in actual business scenarios:

  1. First, disassemble the calculation formula and determine whether the disassembled fields are in the data set. If there are all, it will end, indicating that it is a simple requirement.
  2. If it is not in the data set, and it is found that the data level of detail does not match the current one (for example, if you want to get the sales volume of each country, but the current dimension is the city), you must use the FIXED expression to fix the level of detail.
  3. If you are not explicitly calculating at a certain level of detail, do not use FIXED because it is not very flexible.
  4. When you want to skip a specified level of detail when calculating, but want to keep the level of detail in the view, use the EXCLUDE expression.
  5. If the calculation involves a lower level of detail than the view, such as calculating the average or maximum minimum, use the INCLUDE expression.
  6. Fields created using FIXED expressions can also be used for secondary calculations. Reasonable disassembly and combination of multiple calculated fields will make the logic clearer and easier to understand.
The discussion address is: Intensive Reading "15 LOD Expressions-Part 1" · Issue #369 · dt-fe/weekly

If you want to participate in the discussion, please click here , there is a new theme every week, weekend or Monday. Front-end intensive reading-to help you filter reliable content.

Follow front-end intensive reading WeChat public number

<img width=200 src="https://img.alicdn.com/tfs/TB165W0MCzqK1RjSZFLXXcn2XXa-258-258.jpg">

Copyright statement: Freely reprinted-non-commercial-non-derivative-keep the signature ( creative sharing 3.0 license )

黄子毅
7k 声望9.5k 粉丝