New Features of MongoDB Time Series

Table of contents:

Parsing MongoDB's new feature "Timing"
How to use time series in MongoDB?
MongoDB Time Series Collection Performance
MongoDB Time Series IOT Scenario Design

1. Analyze the new feature of MongoDB "sequence"

MongoDB time series collection is a new feature of MongoDB 5.0. It can quickly write data to disk within a period of time and provide collections for fast time series retrieval.
Compared with ordinary collections, in the process of data insertion, time series collections automatically organize data into the optimal storage format according to the time dimension, which also improves the query efficiency of time series data for subsequent applications.

MongoDB traditional time series mode:

Suppose we have a sensor that measures temperature every minute and saves it to a database, we need to write to a stream of data in the database:

 {_id: ObjectId(), deviceid: 1, date: ISODate ("2019-11-10"), samples : [{ temperature: 10, time: 1573833152},]}，
{_id: ObjectId(), deviceid: 1, date: ISODate ("2019-11-10"), samples : [[ temperature: 15, time: 1573833153},]},
{_id: ObjectId(), deviceid: 1, date: ISODate ("2019-11-10"), samples : [[ temperature: 14, time: 1573833154},]},
{_id: ObjectId(), deviceid: 1, date: TSODate("2019-11-10"), samples : [[ temperature: 20, time: 1573833155},]}

Bucket pattern design data model:

 {
  _id: objectId(),
  deviceid: 1,
  date: ISODate ( "2019-11-10") ,
  first: 1573833152,
  last: 1573833155,
  samples : [
    { temperature: 10, time: 1573833152},
    { temperature: 15, time : 1573833153},
    { temperature: 14, time: 1573833154),
    { temperature: 20, time : 1573833155}
  ]
}

Field Explanation:

id — the ID of the document, this ID is unique
deviceld — the queried device ID
date - the sampling date; we can store it here to simplify aggregation
first — the timestamp of the oldest data read in the bucket
last — the timestamp of the latest data read in the bucket
samples—data container

Advantages of the bucket pattern in use cases:

Save data and index size
Simplified data structure
The data to be collected can be grouped together according to the time dimension, which is convenient for fast range retrieval
Improve data writing speed

2. How to use timing in MongoDB

Displays the collection created by the specified as a time series collection

 db.createcollection (
"weather"，
{
  timeseries: {
    timeField: "timestamp"，
    metaField: "metadata"，
    granularity: "hours"
  }
}

Field meaning introduction:

timeField is a time parameter and must be BSON data.
metaField affects the dimension cardinality. A good metaField should choose low cardinality and selective indicators. High cardinality will inevitably lead to performance degradation.
Granularity is the aggregation granularity (optional) parameter. The database will aggregate the data for a period of time and store it. This parameter affects the performance but not the function.
expireAfterSeconds affects the expiration of data and is implemented by default every 60s. Configurable expiration time

CRUD operations

Added: Single insertion or batch insertion into collections (no difference from traditional collections)
delete (omitted)
change (omitted)
check:

Calculate the average of the time series collection period (aggregation query):

 db.weather.aggregate([
  {
    project: {
      date: {
          $dateToParts: { date: "$timestamp" }
      },
      temp: 1
  },
  {
    $group: {
       _id: {
         date: {
             year : "$date. year",
             month: "$date.month",
             day : " $date.day"
          }
          avgTmp: { $avg: "stemp"}
      }
])

be careful:

The underlying storage of the time series collection is still WiredTiger;
There are not too many new syntaxes customized for time series queries, and various aggregations still need to be performed through aggregate;
The time series collection has optimized the storage model of the data according to the commonly used query mode. If you have your own filtering requirements for metafield on the index, you can create a secondary index normally;
MongoDB time series collection needs to add specified conditions to update and delete.
In the current version, time series collections do not support sharding (6.0 supports sharding).

3. MongoDB time series collection performance

Write performance (4C 8G 128G ssd)

Read-write hybrid stress test performance:

Disk usage:

MongoDB supports snappy, zstd and zlib algorithms for data compression. Comparing the real data space size and real disk space consumption in the past online, the following conclusions can be drawn:

compression algorithm	real data volume	real disk space consumption
snappy compression algorithm	3.5T	1-1.5T
zstd compression algorithm	3.5T	0.6-0.9T
zlib compression algorithm	3.5T	0.5-0.7T

Hbase uses the snappy algorithm by default, and the MongoDB time series collection uses the zstd compression algorithm by default, so the same amount of data, the MongoDB disk usage is lower.

MongoDB time series collection usage restrictions:

Client encryption
ChangeStream
Reindex Reindex
Tigers
Updating and removing restrictions

4. MongoDB sequential IOT scenario design

Scenario requirements:
Data quality, real-time consumption of kafka data, and after stream computing, the data needs to be displayed, as shown in the flow chart:

time series
read-write separation
ChangeStream offload query

Expired data cleaning:

The original TTL index of the time series collection can be used for automatic expiration.
You can directly delete the old set by replacing the old set with the new one.

References:

New Features of MongoDB Time Series

王顶

引用和评论

node.js child_process 模块用法总结

7天撸完KTV点歌系统,含后台管理系统(完整版)

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

深入探索嵌入式开发中的 FreeRTOS：从入门到精通

支付宝 IoT 设备入门宝典（下）设备经营篇

《ESP32-S3使用指南—IDF版 V1.6》第八章 MENUCONFIG菜单配置

《ESP32-S3使用指南—IDF版 V1.6》第九章 IDF组件注册表