给定一系列带时间戳的健康检查,确定系统在给定时间间隔内的总体健康状况可能很棘手。PostgresQL 提供了窗口函数,可以用来了解不健康的时间间隔在哪里,但有效使用起来可能有些笨拙。心跳聚合是 Timescale Toolkit 的一部分,可以用来以更简单、更易于访问的方式解决这个问题。

此示例使用 SustData 公共数据集。此数据集跟踪少量公寓和房屋在四个不同部署间隔内的用电量。数据以每分钟样本的形式从每个单元收集。

将数据加载到超表后,可以创建一个包含每个单元每周心跳聚合的物化视图。

CREATE MATERIALIZED VIEW weekly_heartbeat AS
SELECT
time_bucket('1 week', tmstp) as week,
iid as unit,
deploy,
heartbeat_agg(tmstp, time_bucket('1w', tmstp), '1w', '2m')
FROM power_samples
GROUP BY 1,2,3;

心跳聚合接受四个参数:时间戳列、间隔的开始时间、间隔的长度,以及每次时间戳后聚合被视为存活的时间长度。此示例使用 2 分钟作为心跳生命周期,以提供对小间隙的一些容忍度。

您可以使用此数据来查看何时收到特定单元的数据。此示例将每周聚合汇总为单个聚合,然后查看存活范围

SELECT live_ranges(rollup(heartbeat_agg)) FROM weekly_heartbeat WHERE unit = 17;
live_ranges
-----------------------------------------------------
("2010-09-18 00:00:00+00","2011-03-27 01:01:50+00")
("2011-03-27 03:00:52+00","2011-07-03 00:01:00+00")
("2011-07-05 00:00:00+00","2011-08-21 00:01:00+00")
("2011-08-22 00:00:00+00","2011-08-25 00:01:00+00")
("2011-08-27 00:00:00+00","2011-09-06 00:01:00+00")
("2011-09-08 00:00:00+00","2011-09-29 00:01:00+00")
("2011-09-30 00:00:00+00","2011-10-04 00:01:00+00")
("2011-10-05 00:00:00+00","2011-10-17 00:01:00+00")
("2011-10-19 00:00:00+00","2011-11-09 00:01:00+00")
("2011-11-10 00:00:00+00","2011-11-14 00:01:00+00")
("2011-11-15 00:00:00+00","2011-11-18 00:01:00+00")
("2011-11-20 00:00:00+00","2011-11-23 00:01:00+00")
("2011-11-24 00:00:00+00","2011-12-01 00:01:00+00")
("2011-12-02 00:00:00+00","2011-12-12 00:01:00+00")
("2011-12-13 00:00:00+00","2012-01-12 00:01:00+00")
("2012-01-13 00:00:00+00","2012-02-03 00:01:00+00")
("2012-02-04 00:00:00+00","2012-02-10 00:01:00+00")
("2012-02-11 00:00:00+00","2012-03-25 01:01:50+00")
("2012-03-25 03:00:51+00","2012-04-11 00:01:00+00")

您也可以执行更复杂的查询,例如查找第三次部署期间正常运行时间最低的 5 个单元

SELECT unit, uptime(rollup(heartbeat_agg))
FROM weekly_heartbeat
WHERE deploy = 3
GROUP BY unit
ORDER BY uptime LIMIT 5;
unit | uptime
------+-------------------
31 | 203 days 22:05:00
34 | 222 days 22:05:00
32 | 222 days 22:05:00
35 | 222 days 22:05:00
30 | 222 days 22:05:00

您还可以组合来自不同单元的聚合,以获得组合覆盖率。此示例查询任何部署部分处于活动状态的时间间隔

SELECT deploy, live_ranges(rollup(heartbeat_agg))
FROM weekly_heartbeat group by deploy order by deploy;
deploy | live_ranges
--------+-----------------------------------------------------
1 | ("2010-07-29 00:00:00+00","2010-11-26 00:01:00+00")
2 | ("2010-11-25 00:00:00+00","2011-03-27 01:01:59+00")
2 | ("2011-03-27 03:00:00+00","2012-03-25 01:01:59+00")
2 | ("2012-03-25 03:00:26+00","2012-04-17 00:01:00+00")
2 | ("2012-04-20 00:00:00+00","2012-04-21 00:01:00+00")
2 | ("2012-05-11 00:00:00+00","2012-05-13 00:01:00+00")
2 | ("2013-02-20 00:00:00+00","2013-02-21 00:01:00+00")
3 | ("2012-08-01 00:00:01+00","2013-03-31 01:01:16+00")
3 | ("2013-03-31 03:00:03+00","2013-05-22 00:01:00+00")
4 | ("2013-07-31 00:00:00+00","2014-03-30 01:01:49+00")
4 | ("2014-03-30 03:00:01+00","2014-04-25 00:01:00+00")

您可以使用此数据进行一些观察。首先,看起来第二次部署比其他部署有更多问题。其次,看起来 2013 年 2 月的一些读数被错误地归类为第二次部署。最后,看起来时间戳是以不带时区的本地时间给出的,导致春季夏令时更改前后缺少一些小时。

有关心跳聚合 API 调用的更多信息,请参阅超级函数 API 文档

关键词

在此页面上发现问题?报告问题 或 在 GitHub 上编辑此页