Table of contents:

  1. Parsing MongoDB's new feature "Timing"
  2. How to use time series in MongoDB?
  3. MongoDB Time Series Collection Performance
  4. MongoDB Time Series IOT Scenario Design

1. Analyze the new feature of MongoDB "sequence"

  • MongoDB time series collection is a new feature of MongoDB 5.0. It can quickly write data to disk within a period of time and provide collections for fast time series retrieval.
  • Compared with ordinary collections, in the process of data insertion, time series collections automatically organize data into the optimal storage format according to the time dimension, which also improves the query efficiency of time series data for subsequent applications.
  1. MongoDB traditional time series mode:

Suppose we have a sensor that measures temperature every minute and saves it to a database, we need to write to a stream of data in the database:

 {_id: ObjectId(), deviceid: 1, date: ISODate ("2019-11-10"), samples : [{ temperature: 10, time: 1573833152},]},
{_id: ObjectId(), deviceid: 1, date: ISODate ("2019-11-10"), samples : [[ temperature: 15, time: 1573833153},]},
{_id: ObjectId(), deviceid: 1, date: ISODate ("2019-11-10"), samples : [[ temperature: 14, time: 1573833154},]},
{_id: ObjectId(), deviceid: 1, date: TSODate("2019-11-10"), samples : [[ temperature: 20, time: 1573833155},]}
  1. Bucket pattern design data model:
 {
  _id: objectId(),
  deviceid: 1,
  date: ISODate ( "2019-11-10") ,
  first: 1573833152,
  last: 1573833155,
  samples : [
    { temperature: 10, time: 1573833152},
    { temperature: 15, time : 1573833153},
    { temperature: 14, time: 1573833154),
    { temperature: 20, time : 1573833155}
  ]
}

Field Explanation:

  • id — the ID of the document, this ID is unique
  • deviceld — the queried device ID
  • date - the sampling date; we can store it here to simplify aggregation
  • first — the timestamp of the oldest data read in the bucket
  • last — the timestamp of the latest data read in the bucket
  • samples—data container
  1. Advantages of the bucket pattern in use cases:
  • Save data and index size
  • Simplified data structure
  • The data to be collected can be grouped together according to the time dimension, which is convenient for fast range retrieval
  • Improve data writing speed

2. How to use timing in MongoDB

  1. Displays the collection created by the specified as a time series collection
 db.createcollection (
"weather",
{
  timeseries: {
    timeField: "timestamp",
    metaField: "metadata",
    granularity: "hours"
  }
}

Field meaning introduction:

  • timeField is a time parameter and must be BSON data.
  • metaField affects the dimension cardinality. A good metaField should choose low cardinality and selective indicators. High cardinality will inevitably lead to performance degradation.
  • Granularity is the aggregation granularity (optional) parameter. The database will aggregate the data for a period of time and store it. This parameter affects the performance but not the function.
  • expireAfterSeconds affects the expiration of data and is implemented by default every 60s. Configurable expiration time
  1. CRUD operations
  • Added: Single insertion or batch insertion into collections (no difference from traditional collections)
  • delete (omitted)
  • change (omitted)
  • check:

Calculate the average of the time series collection period (aggregation query):

 db.weather.aggregate([
  {
    project: {
      date: {
          $dateToParts: { date: "$timestamp" }
      },
      temp: 1
  },
  {
    $group: {
       _id: {
         date: {
             year : "$date. year",
             month: "$date.month",
             day : " $date.day"
          }
          avgTmp: { $avg: "stemp"}
      }
])
  1. be careful:
  • The underlying storage of the time series collection is still WiredTiger;
  • There are not too many new syntaxes customized for time series queries, and various aggregations still need to be performed through aggregate;
  • The time series collection has optimized the storage model of the data according to the commonly used query mode. If you have your own filtering requirements for metafield on the index, you can create a secondary index normally;
  • MongoDB time series collection needs to add specified conditions to update and delete.
  • In the current version, time series collections do not support sharding (6.0 supports sharding).

3. MongoDB time series collection performance

  1. Write performance (4C 8G 128G ssd)

  1. Read-write hybrid stress test performance:

  1. Disk usage:

MongoDB supports snappy, zstd and zlib algorithms for data compression. Comparing the real data space size and real disk space consumption in the past online, the following conclusions can be drawn:

compression algorithm real data volume real disk space consumption
snappy compression algorithm 3.5T 1-1.5T
zstd compression algorithm 3.5T 0.6-0.9T
zlib compression algorithm 3.5T 0.5-0.7T

Hbase uses the snappy algorithm by default, and the MongoDB time series collection uses the zstd compression algorithm by default, so the same amount of data, the MongoDB disk usage is lower.

  1. MongoDB time series collection usage restrictions:
  • Client encryption
  • ChangeStream
  • Reindex Reindex
  • Tigers
  • Updating and removing restrictions

4. MongoDB sequential IOT scenario design

Scenario requirements:
Data quality, real-time consumption of kafka data, and after stream computing, the data needs to be displayed, as shown in the flow chart:

  • time series
  • read-write separation
  • ChangeStream offload query

  1. Expired data cleaning:
  • The original TTL index of the time series collection can be used for automatic expiration.
  • You can directly delete the old set by replacing the old set with the new one.

References:


王顶
1.2k 声望107 粉丝

学无止境