[mongo series] sorting out the aggregated knowledge points

What is aggregated data?

Let's first look at the aggregated data

Data Aggregation refers to combining data from different data sources . .
Clustering, also known as cluster analysis, also known as cluster analysis, is a technique for statistical data analysis.
Widely used in many fields, including machine learning, data mining, pattern recognition, image analysis, and bioinformatics.

What is an aggregate query?

Aggregation operations process data to record and return calculation results

Bureaus and action groups have values from multiple documents, and various operations can be performed on the grouped data to scope a single result

Aggregation operations generally include the following three categories:

single acting polymerization
Aggregation pipeline
MapReduce

https://docs.mongodb.com/manual/aggregation/

single acting polymerization

mongodb itself provides the following single-purpose aggregation functions. These single aggregation functions are not flexible enough compared to aggregation pipelines and mapReduce, and lack rich functions.

db.collection name.estimatedDocumentCount()

Roughly calculate the number of documents, which is an estimate

db.collectionname.count()

Count the number of documents, calculated by aggregation

db.collectionname.distinct()

See what values a field has

E.g:

 > db.users.find()
{ "_id" : ObjectId("61584aeeee74dfe04dac57e9"), "name" : "xiaokeai", "age" : 25, "hobby" : "reading", "infos" : { "tall" : 175, "height" : 62 }, "school" : "cs" }
{ "_id" : ObjectId("615a56d6bc6afecd2cff8f96"), "name" : "xiaozhu", "age" : 15, "hobby" : "basketball", "infos" : { "tall" : 190, "height" : 70 }, "school" : "sh" }
{ "_id" : ObjectId("615a5856d988690b07c69f64"), "name" : "xiaopang" }
{ "_id" : ObjectId("615a5917d988690b07c69f66"), "name" : "nancy", "age" : 25, "hobby" : "study", "infos" : { "tall" : 175, "height" : 60 }, "school" : "hn" }
{ "_id" : ObjectId("615a5917d988690b07c69f67"), "name" : "job", "age" : 19, "hobby" : "basketball", "infos" : { "tall" : 170, "height" : 70 }, "school" : "nj" }

> db.users.distinct("age")
[ 15, 19, 25 ]

In the above example, use db.users.distinct("age") to see what values exist in the age field

Aggregation pipeline

https://docs.mongodb.com/manual/core/aggregation-pipeline/

The aggregation pipeline contains multiple stages. Each stage is converted when the file passes through the pipeline. The pipeline here can be understood as a pipeline in linux. The input of the next instruction is the output of the previous instruction.

db.集合名.aggregate(<pipelines>,<options>)

pipelines

A set of data aggregation stages, except $out , $Merge , $geonear can only appear once in the pipeline, other operators can appear in each stage of the pipeline appears multiple times in

options

optional, additional parameters for the aggregation operation

This includes the query plan, whether to use temporary files, cursors, maximum operation time, read and write strategy, mandatory indexing, etc.

Commonly used pipeline aggregation stages

Sort out the commonly used pipeline aggregation stages as follows

stage keyword	describe
$match	filter
$group	grouping
$project	Show fields
$lookup	Multi-table association
$unwind	expand array
$out	The results are imported into a new table
$count	$document count
`$sort` , `$skip` , `$limit`	Sort and paginate

For other stages, we check the official website https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/

Example of $count

The first one $group is used to filter data. In the aggregation pipeline, the output here is the input of the next pipeline, and the next pipeline is $project select the displayed field

MapReduce

https://docs.mongodb.com/manual/core/map-reduce/

MapReduce operations split a large amount of data processing work into multiple threads for parallel processing, and then merge the results together

MapReduce has the following 2 stages:

map stage that brings together document data with the same key
The reduce phase that combines the results of map operations for statistical output

You can see an example on the official website

emit makes a map of cust_id and amount, the filter condition is status:"A" , and finally puts the result into a new set named order_totals

The MapReduce operation syntax is as follows:

 do.集合名.mapReduce(<map>,<reduce>,
{
  out:<collection>,query:<document>,
  sort:<document>,limit:<number>,
  finalize:<function>mscope:<document>,
  jsMode:<boolean>,verbose:<boolean>,
  bypassDocumentValidation:<boolean>
}
)

Split the data into key-value pairs and give it to the reduce function

reduce

Perform statistical operations on values based on keys

Optional, import the results into the specified table

query

Optional parameter, the condition for filtering data, the result is sent to map

sort

After sorting, send it to map

limit

Limit the number of documents fed into map

finalize

Optional, output after modifying the result of reduce

scope

Optional, specify global variables for map, reduce, finalize

jsMode

Optional, the default is false, whether to convert the data to bson format during mapreduce

verbose

Optional parameter, whether to display the time in the result, the default is false

bypassDocumentValidation

Optional parameter, the master skips the data verification process

Aggregation pipeline vs MapReduce

comparison	Aggregation pipeline	MapReduce
Purpose	Used to improve performance and availability of aggregation tasks	For processing large data sets, when the data is huge, which MapReduce is more convenient to use
feature	The pipeline operator can be repeated as needed, and the pipeline operation does not have to produce an output document for each input document	In addition to grouping operations, perform complex aggregation tasks and perform incremental aggregations on growing datasets
flexibility	Limited to operators and expressions supported by the aggregation pipeline	Custom map , reduce and finalize javascript functions provide flexibility and aggregation logic
output result	Return the result as a cursor, if the pipeline includes one `$out` or multiple `$merge` stages, the cursor is empty	Inline, new collection, merge, replace, minify, return result with various options
Fragmentation	Support for non-sharded and sharded input collections	Support for non-sharded and sharded input collections

For a detailed comparison, you can check the official website https://docs.mongodb.com/manual/reference/map-reduce-to-aggregation-pipeline/

Welcome to like, follow, favorite

Friends, your support and encouragement are the motivation for me to persist in sharing and improve quality

Okay, here it is this time

Technology is open, and our mentality should be open. Embrace change, live in the sun, and move forward.

I am the native of Abingyun , welcome to like, follow and collect, see you next time~

[mongo series] sorting out the aggregated knowledge points

What is aggregated data?

What is an aggregate query?

single acting polymerization

Aggregation pipeline

Commonly used pipeline aggregation stages

MapReduce

Aggregation pipeline vs MapReduce

Welcome to like, follow, favorite

阿兵云原生

引用和评论

GO 语言如何用好变长参数？

7天撸完KTV点歌系统,含后台管理系统(完整版)

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

Studio 3T 2025.5 - MongoDB 的终极 GUI、IDE 和客户端

印度股票数据API对接文档

MCP Server 实现笔记：开发者视角下的优缺点

Studio 3T 2025.7 发布 - MongoDB 的终极 GUI、IDE 和客户端

[mongo series] sorting out the aggregated knowledge points

What is aggregated data?

What is an aggregate query?

single acting polymerization

Aggregation pipeline

Commonly used pipeline aggregation stages

MapReduce

Aggregation pipeline vs MapReduce

Welcome to like, follow, favorite

阿兵云原生

引用和评论

GO 语言如何用好变长参数？

7天撸完KTV点歌系统,含后台管理系统(完整版)

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

Studio 3T 2025.5 - MongoDB 的终极 GUI、IDE 和 客户端

印度股票数据API对接文档

MCP Server 实现笔记：开发者视角下的优缺点

Studio 3T 2025.7 发布 - MongoDB 的终极 GUI、IDE 和 客户端

Studio 3T 2025.5 - MongoDB 的终极 GUI、IDE 和客户端

Studio 3T 2025.7 发布 - MongoDB 的终极 GUI、IDE 和客户端