请教一个数据查询问题(连续一段时间大于某个值),希望大家能够提供方法

数据表中有时间和数据值两列,
现在需要查询连续大于等于3分钟大于某个值(从大于某个值开始一直到小于该值结束)的数据,
并计算出连续时间,和该段时间的平均值。如何通过SQL语句实现该要求。
数据示例

阅读 14k
2 个回答

简单做了个测试,希望对你有帮助。

创建表和测试数据

CREATE TABLE t_log(f_time DATE, f_value NUMBER);
insert into T_LOG (F_TIME, F_VALUE)
values (to_date('24-02-2017 12:00:00', 'dd-mm-yyyy hh24:mi:ss'), 100);

insert into T_LOG (F_TIME, F_VALUE)
values (to_date('24-02-2017 12:00:01', 'dd-mm-yyyy hh24:mi:ss'), 101);

insert into T_LOG (F_TIME, F_VALUE)
values (to_date('24-02-2017 12:00:02', 'dd-mm-yyyy hh24:mi:ss'), 102);

insert into T_LOG (F_TIME, F_VALUE)
values (to_date('24-02-2017 12:00:03', 'dd-mm-yyyy hh24:mi:ss'), 103);

insert into T_LOG (F_TIME, F_VALUE)
values (to_date('24-02-2017 12:00:04', 'dd-mm-yyyy hh24:mi:ss'), 95);

insert into T_LOG (F_TIME, F_VALUE)
values (to_date('24-02-2017 13:00:00', 'dd-mm-yyyy hh24:mi:ss'), 108);

insert into T_LOG (F_TIME, F_VALUE)
values (to_date('24-02-2017 13:01:00', 'dd-mm-yyyy hh24:mi:ss'), 105);

insert into T_LOG (F_TIME, F_VALUE)
values (to_date('24-02-2017 13:03:00', 'dd-mm-yyyy hh24:mi:ss'), 99);

insert into T_LOG (F_TIME, F_VALUE)
values (to_date('24-02-2017 13:05:00', 'dd-mm-yyyy hh24:mi:ss'), 108);

查询语句,假设时间字段没有重复值结果才会正确:
大概分三个步骤:

  1. 先按照时间字段排序,计算出每个连续大于等于目标值分组的第一行记录(判断标准是当前行大于等于目标值,同时下一行小于目标值,下面sql语句t表内容)

  2. 计算上面分组的日期范围计算出来(下面sql语句中t2的内容),并过滤连续时间超过3分钟,形成表t3

  3. 然后将数据表和t3表关联,就可以得到连续大于目标值的行分组。

WITH t2 AS (
  SELECT f_group1, LEAD(f_group1, 1, to_date('9999-12-31', 'yyyy-mm-dd')) OVER (ORDER BY f_time) AS f_group2
  FROM (
    SELECT f_time, f_value, CASE WHEN f_value >= 100 AND LAG(f_value, 1, 0) OVER(ORDER BY f_time) < 100 THEN f_time ELSE NULL END AS f_group1
    FROM t_log
    ORDER BY f_time
  ) t
  WHERE f_group1 IS NOT NULL
), t3 as (
  select * from t2 where (f_group2 - f_group1) > = 180
)
SELECT 
  t3.f_group1,
  (MAX(t_log.f_time) - MIN(t_log.f_time))*86400, AVG(t_log.f_value)
FROM t_log INNER JOIN t3 ON t_log.f_time >= t3.f_group1 AND t_log.f_time < t3.f_group2
WHERE f_value >= 100
GROUP BY t3.f_group1

用阿里巴巴今年以来的股价t_log做例子,求连续4天以上股价大于$102的区间。

f_time f_val
2017-02-01 07:16:17 101.57
2017-02-02 16:36:38 100.84
2017-02-03 04:47:59 100.39
2017-02-06 13:12:05 100.90
2017-02-07 06:32:45 100.83
2017-02-08 02:27:47 103.57
2017-02-09 07:40:26 103.34
2017-02-10 17:59:55 102.36
2017-02-13 14:40:22 103.10
2017-02-14 17:14:21 101.59
2017-02-15 05:33:37 101.55
2017-02-16 03:04:19 100.82
2017-02-17 09:44:34 100.52
2017-02-21 18:26:33 102.12
2017-02-22 01:14:24 104.20
2017-02-23 03:40:00 102.46
2017-02-24 15:38:03 102.95

Postgresql代码:

with
flip_time as (
  select * from (
    select case when
      (lag(f_val, 1, 0.0) over win < 102 and f_val >= 102) or
      (f_val >= 102 and lead(f_val, 1, 0.0) over win < 102) then f_time
      else null end as f_time
    from t_log 
    window win as (order by f_time)) as foo
  where f_time is not null),

duration as (
  select t1, t2 from (
    select f_time as t1,
      lead(f_time) over (order by f_time) as t2,
      row_number() over () as num
    from flip_time) as foo
  where t2 - t1 >= interval '4 days' and num%2=1)

select d.t1 t_start, d.t2 t_end, d.t2 - d.t1 as duration, round(avg(g.f_val),4) mean
from t_log g inner join duration d
  on d.t1 <= g.f_time and
     g.f_time <= d.t2
group by d.t1, d.t2, d.t2 - d.t1;
  • flip_time:阈值(102)区间段开始或结束的时点。通过窗口函数lag()lead()比较前后两行是否跨越阈值取得。

  • durationflip_time中一定是“开始-结束”的循环,利用此特性进一步加工出把开始和结束时点放在同一行,且长度符合要求(4天)的行。

  • durationt_log联接可计算出持续时间和平均值。

运行结果是:

t_start t_end duration mean
2017-02-08 02:27:47 2017-02-13 14:40:22 5 days 12:12:35 103.0925
撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题
宣传栏