Overview of the new features of MongoDB 5.0

Introduction to MongoDB 5.0 marks the arrival of a new release cycle to deliver new features to users faster. The combination of versioned API and online resharding saves users from worrying about future database upgrades and business changes; the local native time series data platform also enables MongoDB to support a wider range of workloads and business scenarios; the new MongoDB Shell can improve User experience, etc. are all features of MongoDB 5.0. This article mainly introduces the new features of MongoDB 5.0.

MongoDB 5.0 marks the arrival of a new release cycle to deliver new features to users faster. The combination of versioned API and online resharding saves users from worrying about future database upgrades and business changes; the local native time series data platform also enables MongoDB to support a wider range of workloads and business scenarios; the new MongoDB Shell can improve User experience, etc. are all features of MongoDB 5.0. This article mainly introduces the new features of MongoDB 5.0.

Native time series platform

MongoDB 5.0 natively supports the entire life cycle of time series data (from collection, storage, query, real-time analysis and visualization, to online archiving or automatic invalidation as data ages), making it faster to build and run time series applications. The cost is lower. With the release of MongoDB 5.0, MongoDB has expanded the universal application data platform, making it easier for developers to process time series data, and further expanding its application scenarios in the Internet of Things, financial analysis, logistics and other aspects.

MongoDB's time series collection automatically stores time series data in a highly optimized and compressed format, reducing storage size and I/O to achieve better performance and larger scale. It also shortens the development cycle, enabling you to quickly build a model optimized for the performance and analysis requirements of time series applications.

An example of a command to create a time series data collection:

db.createCollection("collection_name",{ timeseries: { timeField: "timestamp" } } )

MongoDB can seamlessly adjust the acquisition frequency and automatically process out-of-order measurement values based on dynamically generated time partitions. The newly released MongoDB Connector for Apache Kafka implements local support for time series. You can automatically create a time series collection directly from Kafka topic messages, allowing you to process and aggregate the data as needed while collecting data, and then write it to MongoDB's time series collection.

Time series collection automatically creates a clustered index of data sorted by time to reduce the latency of querying data. The MongoDB query API also extends window functions so that you can run analytical queries (such as moving averages and cumulative sums). In relational database systems, these are usually referred to as SQL analysis functions, and support windows defined in units of rows (that is, three-line moving averages). MongoDB goes a step further and adds powerful time series functions such as exponential moving average, derivative, and integral, allowing you to define a window in units of time (for example, a 15-minute moving average). Window functions can be used to query MongoDB's time series and regular collections, providing new analysis methods for multiple application types. In addition, MongoDB 5.0 also provides new time operators, including $dateAdd , $dateSubstract , $dateDiff and $dateTrunc , allowing you to summarize and query data through a custom time window.

You can combine MongoDB's time series data with other data in the enterprise. Time series collections can be put together with regular MongoDB collections in the same database. You don’t have to choose a dedicated time series database (it can’t provide services for any other types of applications), and you don’t need complex integration to mix time series and Other data. MongoDB provides a unified platform that allows you to build high-performance and efficient time series applications while also providing support for other use cases or workloads, thereby eliminating the cost and complexity of integrating and running multiple different databases.

Online data resharding

Database version	Features	Implementation
Before MongoDB 5.0	The re-sharding process is complicated and requires manual sharding.

Method 1: First dump the entire collection, and then reload the database into a new collection with the new shard key. Since this is a process that requires offline processing, your application needs to be suspended for a long time before the reload is complete. For example: dumping and reloading a collection of more than 10 TB on a three-slice cluster may take several days.
Method 2: Create a new sharded cluster and reset the sharding key of the collection, and then through the custom migration method, the old sharded cluster that needs to be re-sharded will be written to the new shard according to the new sharding key In the cluster.
- During this process, you need to handle the query routing and migration logic yourself, and constantly check the migration progress to ensure that all data is successfully migrated.
- Custom migration is a highly complex, labor-intensive, risky task, and time-consuming. For example: it took a MongoDB user three months to complete the migration of 10 billion documents.
  |
  | MongoDB 5.0 start|
Run the reshardCollection command to start resharding.
The process of resharding is efficient. It is not simply to rebalance the data, but to copy and rewrite all the data of the current collection in the background to the new collection, while keeping in sync with the new writes of the application.
Resharding is fully automated. The time spent on resharding is compressed from weeks or months to minutes or hours, avoiding lengthy and complicated manual data migration.
By using online resharding, you can easily evaluate the effects of different sharding keys in a development or test environment, and you can also modify the sharding keys when you need them.
| You can change the shard key of the collection on demand when the business is running (data is constantly growing), without the need for database downtime or complicated migration in the data collection. You only need to run the reshardCollection command in the MongoDB Shell, select the database and collection you need to re-shard, and specify the new sharding key.

reshardCollection: "<database>.<collection>", key: <shardkey>

Description

<database>: The name of the database that needs to be re-sharded.
<collection>: The name of the collection that needs to be re-fragmented.
<shardkey>: The name of the shard key.
When you call the reshardCollection command, MongoDB will clone an existing collection and then apply all oplogs in the existing collection to the new collection. When all oplogs are used, MongoDB will automatically switch to the new collection and delete the old collection in the background. gather.
|

Versioned API

Application compatibility Starting from MongoDB 5.0, the versioned API defines a set of commands and parameters most commonly used by applications (regardless of whether the database is released during major annual releases or quarterly rapid releases, these commands will not change). By decoupling the application life cycle and database life cycle, you can fix the driver to a specific version of the MongoDB API. Even if the database is upgraded and improved, your application will continue to run for several years without modifying the code.
Flexible addition of new features and improved content The versioned API supports MongoDB to flexibly add new features and improved content to the database in each version (in a way that the new version is compatible with the previous version). When you need to change the API, you can add a new version of the API and run it on the same server at the same time as the existing version of the API. With the acceleration of the release of MongoDB version, the versioned API enables you to use the features of the latest version of MongoDB faster and easier.

Write Concern Default Majority level

Starting from MongoDB 5.0, the Write Concern is majority. Only when the write operation is applied to the primary node (primary node) and persisted to the log of most replica nodes, will it be submitted and returned successfully. "Out of the box" provides stronger data reliability guarantee.

Description Write Concern is fully adjustable, you can customize Write Concern to balance the application's requirements for database performance and data durability.

Connection management optimization

By default, a client connection corresponds to a thread on the back-end MongoDB server ( net.serviceExecutor configured as synchronous ). Creating, switching, and destroying threads are all expensive operations. When there are too many connections, the threads will occupy more resources of the MongoDB server.

The situation where the number of connections is large or the creation of a connection is out of control is called "connection storm". The reasons for this situation may be various, and it often occurs when the service has been affected.

In response to these situations, MongoDB 5.0 has taken the following measures:

Limit the number of connections that the driver tries to create at any time to prevent overloading the database server in a simple and effective way.
Reduce the frequency of checking when the driver monitors the connection pool, and give unresponsive or overloaded server nodes a chance to buffer and recover.
The driver directs the workload to the faster server with the healthiest connection pool, rather than randomly selecting from the available servers.

The above measures, together with the improvements in the mongos query routing layer of the previous version, further enhance MongoDB's ability to withstand high concurrent loads.

Long-running snapshot query

Long-Running Snapshot Queries increase the versatility and flexibility of the application. You can use this function to run a query with a default time of 5 minutes (or adjust it to a custom duration) while maintaining consistent snapshot isolation with the real-time transactional database. You can also perform snapshot queries on the Secondary node (slave node) , So as to run different workloads in a single cluster and scale them to different shards.

MongoDB implements long-running snapshot query through a project called Durable history in the underlying storage engine, which has been implemented as early as MongoDB 4.4. Durable history will store a snapshot of all field values that have changed since the start of the query. By using Durable history, queries can maintain snapshot isolation. Even when data changes, Durable history also helps to reduce the cache pressure of the storage engine, enabling businesses to achieve higher query throughput under high write load scenarios .

New version of MongoDB Shell

In order to provide a better user experience, MongoDB 5.0 redesigned MongoDB Shell (mongosh) from the ground up to provide a more modern command line experience, as well as enhanced usability features and a powerful scripting environment. The new version of MongoDB Shell has become the default shell of the MongoDB platform. The new version of MongoDB Shell introduces syntax highlighting, smart auto-completion, context help and useful error messages, creating an intuitive and interactive experience for you.

Enhanced user experience
- It's easier to write queries and aggregations, and it's easier to read the results. The new version of MongoDB Shell supports syntax highlighting, which makes it easy for you to distinguish fields, values, and data types to avoid syntax errors. If the error still occurs, the new version of MongoDB Shell can also point out the problem and tell you how to solve it.
- Enter queries and commands faster. The new version of MongoDB Shell supports the intelligent auto-completion function, that is, the new version of MongoDB Shell can give prompts for auto-completion options for methods, commands, MQL expressions, etc. according to the version of MongoDB you are connected to.
  Example: When you don't remember the syntax of a command, you can quickly find the syntax of the command directly from the MongoDB Shell.

Advanced scripting environment The scripting environment of the new version of MongoDB Shell is built on the Node.js REPL (Interactive Interpreter). You can use all Node.js APIs and any modules of NPM in your scripts. You can also load and run scripts from the file system (like the old MongoDB Shell, you can continue to use Load and Eval to execute scripts).
Extensibility and plug-ins The new version of MongoDB Shell is easily extensible, allowing you to use all the features of MongoDB to increase productivity.
In the new version of MongoDB Shell, the Snippets plugin is allowed to be installed. Snippets can be automatically loaded into the MongoDB Shell, and Snippets can use all Node.js APIs and NPM packages. MongoDB also maintains a Snippets repository, which provides some interesting functions (such as analyzing plug-ins for specified collection patterns). You can also freely configure MongoDB Shell to use the plug-ins of your choice.
indicates that the plug-in is currently only an experimental feature of MongoDB Shell.

PyMongoArrow and Data Science

With the release of the new PyMongoArrow API, you can use Python to run complex analysis and machine learning on MongoDB. PyMongoArrow can quickly convert simple MongoDB query results into popular data formats (such as Pandas data frame and NumPy array) to help you simplify your data science workflow.

Schema validation improvements

Schema validation (schema validation) is a way of data application management control for MongoDB. In MongoDB 5.0, schema verification has become simpler and more friendly. When an operation verification fails, a descriptive error message will be generated to help you understand the documents and reasons that do not conform to the verification rules of the collection validator, so as to quickly identify and correct the impact of the verification rules. Error code.

Resumable index creation task

MongoDB 5.0 supports that the ongoing index creation task will be automatically restored to its original position after the node is restarted, reducing the impact of planned maintenance actions on the business. For example: when restarting or upgrading the database node, you don't need to worry about the failure of the current large collection index creation task.

Version release adjustment

Since MongoDB supports many versions and platforms, each release version needs to be verified on more than 20 MongoDB-supported platforms. The verification workload is large, which reduces the delivery speed of MongoDB's new features. Therefore, starting from MongoDB 5.0, MongoDB releases the version It will be divided into Marjor Release (large version) and Rapid Releases (rapid release version). Rapid Releases is used as a development version to provide download and test experience, but it is not recommended to be used in a production environment.

Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.