头图

What is aggregated data?

Let's first look at the aggregated data

Data Aggregation refers to combining data from different data sources . .

Clustering, also known as cluster analysis, also known as cluster analysis, is a technique for statistical data analysis.

Widely used in many fields, including machine learning, data mining, pattern recognition, image analysis, and bioinformatics.

What is an aggregate query?

Aggregation operations process data to record and return calculation results

Bureaus and action groups have values from multiple documents, and various operations can be performed on the grouped data to scope a single result

Aggregation operations generally include the following three categories:

  • single acting polymerization
  • Aggregation pipeline
  • MapReduce

https://docs.mongodb.com/manual/aggregation/

single acting polymerization

mongodb itself provides the following single-purpose aggregation functions. These single aggregation functions are not flexible enough compared to aggregation pipelines and mapReduce, and lack rich functions.

  • db.collection name.estimatedDocumentCount()

Roughly calculate the number of documents, which is an estimate

  • db.collectionname.count()

Count the number of documents, calculated by aggregation

  • db.collectionname.distinct()

See what values a field has

E.g:

 > db.users.find()
{ "_id" : ObjectId("61584aeeee74dfe04dac57e9"), "name" : "xiaokeai", "age" : 25, "hobby" : "reading", "infos" : { "tall" : 175, "height" : 62 }, "school" : "cs" }
{ "_id" : ObjectId("615a56d6bc6afecd2cff8f96"), "name" : "xiaozhu", "age" : 15, "hobby" : "basketball", "infos" : { "tall" : 190, "height" : 70 }, "school" : "sh" }
{ "_id" : ObjectId("615a5856d988690b07c69f64"), "name" : "xiaopang" }
{ "_id" : ObjectId("615a5917d988690b07c69f66"), "name" : "nancy", "age" : 25, "hobby" : "study", "infos" : { "tall" : 175, "height" : 60 }, "school" : "hn" }
{ "_id" : ObjectId("615a5917d988690b07c69f67"), "name" : "job", "age" : 19, "hobby" : "basketball", "infos" : { "tall" : 170, "height" : 70 }, "school" : "nj" }

> db.users.distinct("age")
[ 15, 19, 25 ]

In the above example, use db.users.distinct("age") to see what values exist in the age field

Aggregation pipeline

https://docs.mongodb.com/manual/core/aggregation-pipeline/

The aggregation pipeline contains multiple stages. Each stage is converted when the file passes through the pipeline. The pipeline here can be understood as a pipeline in linux. The input of the next instruction is the output of the previous instruction.

db.集合名.aggregate(<pipelines>,<options>)

  • pipelines

A set of data aggregation stages, except $out , $Merge , $geonear can only appear once in the pipeline, other operators can appear in each stage of the pipeline appears multiple times in

  • options

optional, additional parameters for the aggregation operation

This includes the query plan, whether to use temporary files, cursors, maximum operation time, read and write strategy, mandatory indexing, etc.

Commonly used pipeline aggregation stages

Sort out the commonly used pipeline aggregation stages as follows

stage keyword describe
$match filter
$group grouping
$project Show fields
$lookup Multi-table association
$unwind expand array
$out The results are imported into a new table
$count $document count
$sort , $skip , $limit Sort and paginate

For other stages, we check the official website https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/

Example of $count

The first one $group is used to filter data. In the aggregation pipeline, the output here is the input of the next pipeline, and the next pipeline is $project select the displayed field

MapReduce

https://docs.mongodb.com/manual/core/map-reduce/

MapReduce operations split a large amount of data processing work into multiple threads for parallel processing, and then merge the results together

MapReduce has the following 2 stages:

  • map stage that brings together document data with the same key
  • The reduce phase that combines the results of map operations for statistical output

You can see an example on the official website

emit makes a map of cust_id and amount, the filter condition is status:"A" , and finally puts the result into a new set named order_totals

The MapReduce operation syntax is as follows:

 do.集合名.mapReduce(<map>,<reduce>,
{
  out:<collection>,query:<document>,
  sort:<document>,limit:<number>,
  finalize:<function>mscope:<document>,
  jsMode:<boolean>,verbose:<boolean>,
  bypassDocumentValidation:<boolean>
}
)
  • map

Split the data into key-value pairs and give it to the reduce function

  • reduce

Perform statistical operations on values based on keys

  • out

Optional, import the results into the specified table

  • query

Optional parameter, the condition for filtering data, the result is sent to map

  • sort

After sorting, send it to map

  • limit

Limit the number of documents fed into map

  • finalize

Optional, output after modifying the result of reduce

  • scope

Optional, specify global variables for map, reduce, finalize

  • jsMode

Optional, the default is false, whether to convert the data to bson format during mapreduce

  • verbose

Optional parameter, whether to display the time in the result, the default is false

  • bypassDocumentValidation

Optional parameter, the master skips the data verification process

Aggregation pipeline vs MapReduce

comparison Aggregation pipeline MapReduce
Purpose Used to improve performance and availability of aggregation tasks For processing large data sets, when the data is huge, which MapReduce is more convenient to use
feature The pipeline operator can be repeated as needed, and the pipeline operation does not have to produce an output document for each input document In addition to grouping operations, perform complex aggregation tasks and perform incremental aggregations on growing datasets
flexibility Limited to operators and expressions supported by the aggregation pipeline Custom map , reduce and finalize javascript functions provide flexibility and aggregation logic
output result Return the result as a cursor, if the pipeline includes one $out or multiple $merge stages, the cursor is empty Inline, new collection, merge, replace, minify, return result with various options
Fragmentation Support for non-sharded and sharded input collections Support for non-sharded and sharded input collections

For a detailed comparison, you can check the official website https://docs.mongodb.com/manual/reference/map-reduce-to-aggregation-pipeline/

Welcome to like, follow, favorite

Friends, your support and encouragement are the motivation for me to persist in sharing and improve quality

Okay, here it is this time

Technology is open, and our mentality should be open. Embrace change, live in the sun, and move forward.

I am the native of Abingyun , welcome to like, follow and collect, see you next time~


阿兵云原生
192 声望37 粉丝