Following the last article read "15 LOD Expressions- 1" 16191bf3465393, and this time I continue to summarize the this article 16191bf3465395 Top 15 LOD Expressions 16191bf3465397.
9. The value of the last day in a certain period of time
How to achieve a trend chart comparing the average daily closing price of stocks and the closing price of the last day of the month?
As shown in the figure, the comparison is not a certain time period, but the closing price of the last day of the month, so the LOD expression must be used.
Suppose the original table is as follows:
Date | Ticker | Adj Close |
---|---|---|
29/08/2013 | SYMC | $1 |
28/08/2013 | SYMC | $2 |
27/08/2013 | SYMC | $3 |
We aggregate according to the month as the horizontal axis, and calculate avg([Adj Close])
as the vertical axis. But for calculation and comparison, we need a Max Date field as follows:
Date | Ticker | Adj Close | Max, Date |
---|---|---|---|
29/08/2013 | SYMC | $1 | 29/08/2013 |
28/08/2013 | SYMC | $2 | 29/08/2013 |
27/08/2013 | SYMC | $3 | 29/08/2013 |
If we use the max(Date)
expression, we can see Max Date in the result after aggregation:
Month of Date | Ticker | Avg, Adj Close | Max, Date |
---|---|---|---|
08/2013 | SYMC | $2 | 29/08/2013 |
The reason is that max(Date)
is an aggregate expression, which can only take effect under group by aggregate sql. But if we want to calculate the closing price of the last day, we must execute sum([Close value on last day]
, the expression is as follows:
[Close value on last day] = if [Max Date] = [Date] then [Adj Close] else 0 end
。
But the problem is that the level of detail calculated by this expression is based on the granularity of the day, and we max(Date)
not calculate it under the granularity of the day 06191bf3465879:
Date | Ticker | Adj Close | Max, Date |
---|---|---|---|
29/08/2013 | SYMC | $1 | |
28/08/2013 | SYMC | $2 | |
27/08/2013 | SYMC | $3 |
The reason is that as mentioned above, aggregate expressions cannot appear in non-aggregated detail levels. Therefore, we can easily achieve the following effects by { include : max([Date]) }
Date | Ticker | Adj Close | { include : max([Date]) } |
---|---|---|---|
29/08/2013 | SYMC | $1 | 29/08/2013 |
28/08/2013 | SYMC | $2 | 29/08/2013 |
27/08/2013 | SYMC | $3 | 29/08/2013 |
{ include : max([Date]) }
expression does not have a given include parameter, which means that it will always be calculated at the detail level of the current view, so this field can also appear in each row of the detail table when it is pushed down to the detail table for calculation. Then assemble the expression according to the above ideas.
To expand, if we aggregate the horizontal axis year by year, then the comparison value is the closing price on the last day of each year. The reason is that { include : max([Date]) }
max([Date])
at the granularity of the current year, which is naturally the last day of the year, and then push it down to the [Close value on last day]
roughly like this:
Date | Ticker | Adj Close | [Close value on last day] |
---|---|---|---|
31/12/2013 | SYMC | $1 | $1 |
30/12/2013 | SYMC | $2 | $1 |
... | ... | ... | ... |
03/01/2013 | SYMC | $7 | $1 |
02/01/2013 | SYMC | $8 | $1 |
01/01/2013 | SYMC | $9 | $1 |
Then the sum([Close value on last day])
value can be aggregated according to 06191bf3469b65.
10. Repurchase array
As shown in the figure below, I want to view the repurchase array of the quarter between the first purchase and the second purchase of the customer:
The key is how to find the quarterly time difference between the first purchase and the second purchase. First, you can calculate the time of each customer's first purchase [1st purchase] = { fixed [customer id] : min([order date]) }
How to calculate the second purchase time? Here is a little trick. First, use [repeat purchase] = iif([order date] > [1st purchase], [order date], null)
get a new column. The value of the row that was purchased for the first time is null. We can use the min
of ignoring null in the calculation of the 06191bf346a756 function to get the second purchase time: [2nd purchase] = { fixed [customer id] : min([repeat purchase]) }
.
Finally, use the datediff
get the number of quarters in the interval: [quarters repeat to purchase] = datediff('quarter', [1st prechase], [2nd purchase])
.
11. Percentage of range mean difference
As shown in the figure below, we want to make a percentage difference between each point of the trend graph and the mean value of the selected area (within the two dashed lines in the figure), and generate a new line graph on the top.
The focus is on the y-axis field of the line chart above and how the percentage of difference is expressed. First, we have to generate a closing value that only contains the specified interval:
[Close value in reference period] = IF [Date] >= [Start reference date] AND [Date] <= [End reference date] THEN [Adj close] END
[Adj close]
when the date is in the specified interval, that is, only contains the value in this interval.
The second step is to calculate the average value of the specified interval, this can be expressed by FIX: [Average daily close value between ref date] = { fixed [Ticker] : AVG([Close value in reference period]) }
.
The third step is to calculate the percentage difference: [percent different from ref period] = ([Adj close] - [Average daily close value between ref date]) / [Average daily close value between ref date]
.
Finally, use [percent different from ref period]
draw the above graphics.
12. Relative period filtering
If we want to compare the data difference between two periods, we may encounter errors caused by incomplete data. For example, the data for March this year was only output up to the 6th, but it is obviously unreasonable to compare with the data for the entire month of March last year. We can use LOD expressions to solve this problem:
The point of relative period filtering is that you cannot directly use dates for comparison, because this year's data is always larger than last year. For example, because the latest data this year is up to 11.11, the data after 11.11 last year will be filtered out.
First find the date of the latest data, use the FIX expression without conditions: [max date] = { max([date]) }
.
Then use the datepart function to calculate the current date is the day of the year:
[day of year of max date] = datepart('dayofyear', [max date])
,[day of year of order date] = datepart('dayofyear', [order date])
。
Therefore, [day of year of max date]
is a card point, and any data older than this many days must be filtered out. So we create a filter condition: [period filter] = [day of year of order date] <= [day of year of max date]
.
[period filter]
use the 06191bf346a9fe field as the filter condition.
13. User login frequency
How to map the monthly login frequency of a user?
To calculate this indicator, you have to divide the user's total active time by the total number of logins.
First calculate the total active time: use the FIX expression to calculate the user's earliest and latest login time:
[first login] = { fixed [user id] : min([log in date]) }
[last login] = { fixed [user id] : max([log in date]) }
Calculate the month diff, which is the number of active months of the user:
[total months user is active] = datediff("month", [first login], [last login])
The total number of logins is relatively simple. After the user ID is fixed, the login date can be counted:
[numbers of logins per user] = { fixed [user id] : count([login date]) }
Finally, we divide the two to get the user login frequency:
[login frequency] = [total months user is active] / [numbers of logins per user]
Making the chart is very simple. [login frequency]
to the horizontal axis and count distinct user ID as the vertical axis.
14. Proportional Brush
This is the most common scenario for LOD. For example, what is the contribution of each category to the total sales of this category?
sum(sales) / sum({ fixed [category] : sum(sales) })
.
The current level of detail is category + country. If we fix the category, we can get the cumulative sales of each category in all countries.
15. Annual purchase frequency by customer group
How to prove that old customers are more loyal?
We can observe the distribution of their purchase frequency each year according to the customer group (customers in 2011 and 2012) as an example as shown in the figure below.
As shown in the figure above, we found that the earlier the customer registered, the higher the percentage of each purchase frequency, which proved the conclusion that the loyalty of regular customers is higher. Note that here is at least N purchases, so the comparison of each line is convincing. If it is purchased N times, it may be difficult for regular customers to buy 1 time less and 10 times more, so it is difficult to compare directly.
First, we generate the legend field, that is, divide the customer group according to the year of purchase of the earliest photo: [Cohort] = { fixed [customer id] : min(Year([order date])) }
Then, similar to our first example, calculate how many customers there are for each order quantity. The only difference is that we not only divide the earliest purchase date according to the customer ID group, that is: { fixed [customer id], [Cohort] : count([order id]) }
.
The above field is used as the X axis, and the Y axis is similar to the first example: count(customer id)
, but what we want to check is to purchase at least N times, that is, the number of purchases is a cumulative value, that is, at least 9 purchases = 9 purchases + 10 purchases Times + ... Purchase MAX times. So it is a kind of windowsum
DESC, and the overall expression should be similar to [Running Total] = WINDOW_SUM(count(customer id)), 0, LAST())
.
Finally, because the actual Y-axis is calculated as a percentage, the index calculated just now for at least N purchases is divided by the total number of purchases under each Cohort, which is [Running Total] / sum({ fixed [Cohort] : count([customer id]) })
.
Summarize
The above examples are based on the superposition of the basic LOD usages of fixed, include, and exclude. But from the actual example, we will find that the real difficulty lies not in the syntax of the LOD expression, but in how we accurately understand the requirements, disassemble them into reasonable calculation steps, and use the calculation steps that need to run the LOD correctly.
LOD expressions seem magical and seem to be "magically" fit together with data. We have to understand that behind LOD is the join between tables, and different levels of detail indicate different group by rules. You can better understand why the LOD expression works like this.
The discussion address is: Intensive Reading of "15 LOD Expressions-Next" · Issue #370 · dt-fe/weekly
If you want to participate in the discussion, please click here , a new theme every week, weekend or Monday. Front-end intensive reading-to help you filter reliable content.
Follow front-end intensive reading WeChat public
<img width=200 src="https://img.alicdn.com/tfs/TB165W0MCzqK1RjSZFLXXcn2XXa-258-258.jpg">
Copyright notice: Freely reproduced-non-commercial-non-derivative-keep the signature ( Creative Commons 3.0 License )
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。