What is aggregated data?
Let's first look at the aggregated data
Data Aggregation refers to combining data from different data sources . .
Clustering, also known as cluster analysis, also known as cluster analysis, is a technique for statistical data analysis.
Widely used in many fields, including machine learning, data mining, pattern recognition, image analysis, and bioinformatics.
What is an aggregate query?
Aggregation operations process data to record and return calculation results
Bureaus and action groups have values from multiple documents, and various operations can be performed on the grouped data to scope a single result
Aggregation operations generally include the following three categories:
- single acting polymerization
- Aggregation pipeline
- MapReduce
https://docs.mongodb.com/manual/aggregation/
single acting polymerization
mongodb itself provides the following single-purpose aggregation functions. These single aggregation functions are not flexible enough compared to aggregation pipelines and mapReduce, and lack rich functions.
- db.collection name.estimatedDocumentCount()
Roughly calculate the number of documents, which is an estimate
- db.collectionname.count()
Count the number of documents, calculated by aggregation
- db.collectionname.distinct()
See what values a field has
E.g:
> db.users.find()
{ "_id" : ObjectId("61584aeeee74dfe04dac57e9"), "name" : "xiaokeai", "age" : 25, "hobby" : "reading", "infos" : { "tall" : 175, "height" : 62 }, "school" : "cs" }
{ "_id" : ObjectId("615a56d6bc6afecd2cff8f96"), "name" : "xiaozhu", "age" : 15, "hobby" : "basketball", "infos" : { "tall" : 190, "height" : 70 }, "school" : "sh" }
{ "_id" : ObjectId("615a5856d988690b07c69f64"), "name" : "xiaopang" }
{ "_id" : ObjectId("615a5917d988690b07c69f66"), "name" : "nancy", "age" : 25, "hobby" : "study", "infos" : { "tall" : 175, "height" : 60 }, "school" : "hn" }
{ "_id" : ObjectId("615a5917d988690b07c69f67"), "name" : "job", "age" : 19, "hobby" : "basketball", "infos" : { "tall" : 170, "height" : 70 }, "school" : "nj" }
> db.users.distinct("age")
[ 15, 19, 25 ]
In the above example, use db.users.distinct("age")
to see what values exist in the age field
Aggregation pipeline
https://docs.mongodb.com/manual/core/aggregation-pipeline/
The aggregation pipeline contains multiple stages. Each stage is converted when the file passes through the pipeline. The pipeline here can be understood as a pipeline in linux. The input of the next instruction is the output of the previous instruction.
db.集合名.aggregate(<pipelines>,<options>)
- pipelines
A set of data aggregation stages, except $out
, $Merge
, $geonear
can only appear once in the pipeline, other operators can appear in each stage of the pipeline appears multiple times in
- options
optional, additional parameters for the aggregation operation
This includes the query plan, whether to use temporary files, cursors, maximum operation time, read and write strategy, mandatory indexing, etc.
Commonly used pipeline aggregation stages
Sort out the commonly used pipeline aggregation stages as follows
stage keyword | describe |
---|---|
$match | filter |
$group | grouping |
$project | Show fields |
$lookup | Multi-table association |
$unwind | expand array |
$out | The results are imported into a new table |
$count | $document count |
$sort , $skip , $limit | Sort and paginate |
For other stages, we check the official website https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/
Example of $count
The first one $group
is used to filter data. In the aggregation pipeline, the output here is the input of the next pipeline, and the next pipeline is $project
select the displayed field
MapReduce
https://docs.mongodb.com/manual/core/map-reduce/
MapReduce operations split a large amount of data processing work into multiple threads for parallel processing, and then merge the results together
MapReduce has the following 2 stages:
- map stage that brings together document data with the same key
- The reduce phase that combines the results of map operations for statistical output
You can see an example on the official website
emit makes a map of cust_id and amount, the filter condition is status:"A"
, and finally puts the result into a new set named order_totals
The MapReduce operation syntax is as follows:
do.集合名.mapReduce(<map>,<reduce>,
{
out:<collection>,query:<document>,
sort:<document>,limit:<number>,
finalize:<function>mscope:<document>,
jsMode:<boolean>,verbose:<boolean>,
bypassDocumentValidation:<boolean>
}
)
- map
Split the data into key-value pairs and give it to the reduce function
- reduce
Perform statistical operations on values based on keys
- out
Optional, import the results into the specified table
- query
Optional parameter, the condition for filtering data, the result is sent to map
- sort
After sorting, send it to map
- limit
Limit the number of documents fed into map
- finalize
Optional, output after modifying the result of reduce
- scope
Optional, specify global variables for map, reduce, finalize
- jsMode
Optional, the default is false, whether to convert the data to bson format during mapreduce
- verbose
Optional parameter, whether to display the time in the result, the default is false
- bypassDocumentValidation
Optional parameter, the master skips the data verification process
Aggregation pipeline vs MapReduce
comparison | Aggregation pipeline | MapReduce |
---|---|---|
Purpose | Used to improve performance and availability of aggregation tasks | For processing large data sets, when the data is huge, which MapReduce is more convenient to use |
feature | The pipeline operator can be repeated as needed, and the pipeline operation does not have to produce an output document for each input document | In addition to grouping operations, perform complex aggregation tasks and perform incremental aggregations on growing datasets |
flexibility | Limited to operators and expressions supported by the aggregation pipeline | Custom map , reduce and finalize javascript functions provide flexibility and aggregation logic |
output result | Return the result as a cursor, if the pipeline includes one $out or multiple $merge stages, the cursor is empty | Inline, new collection, merge, replace, minify, return result with various options |
Fragmentation | Support for non-sharded and sharded input collections | Support for non-sharded and sharded input collections |
For a detailed comparison, you can check the official website https://docs.mongodb.com/manual/reference/map-reduce-to-aggregation-pipeline/
Welcome to like, follow, favorite
Friends, your support and encouragement are the motivation for me to persist in sharing and improve quality
Okay, here it is this time
Technology is open, and our mentality should be open. Embrace change, live in the sun, and move forward.
I am the native of Abingyun , welcome to like, follow and collect, see you next time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。