The need for document storage
Many developers have used mongodb, where data records exist in the form of documents (similar to a form of multi-level nested SQL). For example, the following JSON data structure: dev_ip represents the ip of a server, the location field stores the latitude and longitude, and meminfo.total and meminfo.userd represent the total memory and usage, respectively.
[{
"dev_ip": "123.46.5.111",
"location": [39.916527, 116.397128],
"meminfo": {
"total": 64,
"used": 23.2
}
},
{
"dev_ip": "123.46.5.112",
"location": [39.916144, 116.392582],
"meminfo": {
"total": 64,
"used": 27.8
}
}]
2. Build the table
The data type of Map needs to be used to store documents. In some older versions, the Map data type is still experimental and cannot be used directly. If you want to use it, you need to execute set allow_experimental_map_type = 1;
.
Then we can build a table according to the JSON data structure, location is an array data type, and meminfo is a Map data type.
CREATE TABLE dev_meminfo (
dev_ip String,
location Array(Float64),
meminfo Map(String, Float32)
)
ENGINE = MergeTree()
ORDER BY dev_ip;
clickhouse provides an array type Array(T), which consists of multiple T elements. T can be any underlying data type, or it can be an array type. If it is an array type, a multidimensional array is formed. Cickhouse has limited support for multidimensional arrays, so it is not recommended to use multidimensional arrays. All T elements in the array must have the same data type, otherwise an exception will be thrown.
3. Data storage
In the Linux environment, we can save JSON data as a document, named such as: dev_meminfo.json. Then use the following command line to complete the storage of the document JSON data. The function of jq
is to strip the JSON array and Map from the structure, so as to meet the input format required by JSONEachRow.
cat dev_meminfo.json |jq -c .[] | clickhouse-client --database acaidb -m -u acai --password '<你的密码>' \
--query="INSERT INTO dev_meminfo FORMAT JSONEachRow"
Or we can directly use the INSERT statement to complete a single insertion of data, as follows:
INSERT INTO dev_meminfo FORMAT JSONEachRow {"dev_ip": "123.46.5.112", "location": [39.916144, 116.392582],"meminfo": {"total": 64,"used": 27.8}};
4. Query data
select *
Query data:
# clickhouse-client --database acaidb -m -u acai --password '4rfv$RFV' --query="SELECT * FROM dev_meminfo"
123.46.5.111 [39.916527,116.397128] {'total':64,'used':23.2}
123.46.5.112 [39.916144,116.392582] {'total':64,'used':27.8}
When querying data by field name, it should be noted that the subscript of Ciickhouse's array starts from 0, which is inconsistent with the specification in our traditional programming development.
# clickhouse-client --database acaidb -m -u acai --password '4rfv$RFV' --query="SELECT dev_ip,location[1],location[2],meminfo['total'],meminfo['used'] FROM dev_meminfo"
123.46.5.111 39.916527 116.397128 64 23.2
123.46.5.112 39.916144 116.392582 64 27.8
V. Summary
We see that clickhouse can replace mongodb to store JSON-like document data and support SQL query statistical analysis, which is quite attractive. I just introduced simple queries. In fact, clickhouse provides a lot of statistical analysis functions, window functions, etc. for various statistical analysis scenarios. Of course, there are also many statistical analysis functions for array data types.
What is more noteworthy is: I have seen some evaluation articles, the statistical analysis performance of clickhouse is nearly 100 times that of mongodb (the author has not verified it myself, please pay attention to the follow-up articles of my column). But it does not mean that clickhouse can replace all application scenarios of mongodb, at least the user-oriented concurrent request scenario clickhouse is not satisfied, because clickhouse is positioned as a data warehouse, mainly for data analysis OLAP scenarios, not for users with high concurrency Online Transaction Processing OLTP.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。