前言
最近在参与网络flow采集和分析的项目。主要是遵循netflow和sflow协议,完成对防火墙和核心交换机的流量的采集和存储以及后续分析。flow并发量和数据量都比较大,存储是瓶颈。最开始存储到prometheus和之后测试的infulxdb的方案均宣告失败。看来prometheus还是不适合大数据量的处理,如果数据量过大,需要考虑联邦模式了。
在调研社区相关项目vflow后,准备测试clickhouse的存储方案。
采集端我们并没有采用vflow,而是对telegraf写了专门的针对flow的input插件。然后输出到kafka集群当中。然后消费者从kafka获取数据存储到clickhouse,便于以后的分析。
ClickHouse是一个非常好的分析列式数据库选择,性能比较强劲。官方提供了很对与主流数据库的性能对比,大家可以了解更加详细的测试报告。
安装过程
下载所需的rpm包
下载地址
共5个包。
解决依赖
安装server过程中,出现以下错误:
rpm -ivh clickhouse-server-1.1.54236-4.el7.x86_64.rpm
error: Failed dependencies:
libicudata.so.50()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64
libicui18n.so.50()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64
libicuuc.so.50()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64
libltdl.so.7()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64
libodbc.so.2()(64bit) is needed by clickhouse-server-1.1.54236-4.el7.x86_64
首先通过执行下面语句解决前三个报错
yum install libicu-devel
然后下载libtool-ltdl-2.4.2-22.el7_3.x86_64.rpm
rpm -ivh libtool-ltdl-2.4.2-22.el7_3.x86_64.rpm
然后下载unixODBC-2.3.1-11.el7.x86_64.rpm
rpm -ivh unixODBC-2.3.1-11.el7.x86_64.rpm
安装
1
rpm -ivh clickhouse-server-common-1.1.54236-4.el7.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:clickhouse-server-common-1.1.5423################################# [100%]
2
rpm -ivh clickhouse-server-1.1.54236-4.el7.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:clickhouse-server-1.1.54236-4.el7################################# [100%]
3
rpm -ivh clickhouse-debuginfo-1.1.54236-4.el7.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:clickhouse-debuginfo-1.1.54236-4.################################# [100%]
4
rpm -ivh clickhouse-client-1.1.54236-4.el7.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:clickhouse-client-1.1.54236-4.el7################################# [100%]
5
rpm -ivh clickhouse-compressor-1.1.54236-4.el7.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:clickhouse-compressor-1.1.54236-4################################# [100%]
启动
启动服务
clickhouse-server --config-file=/etc/clickhouse-server/config.xml
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
2018.03.19 17:17:25.113898 [ 1 ] <Warning> Application: Logging to console
2018.03.19 17:17:25.117332 [ 1 ] <Information> : Starting daemon with revision 54236
2018.03.19 17:17:25.117444 [ 1 ] <Information> Application: starting up
2018.03.19 17:17:25.118273 [ 1 ] <Debug> Application: rlimit on number of file descriptors is 1024000
2018.03.19 17:17:25.118299 [ 1 ] <Debug> Application: Initializing DateLUT.
2018.03.19 17:17:25.118307 [ 1 ] <Trace> Application: Initialized DateLUT with time zone `Asia/Shanghai'.
2018.03.19 17:17:25.120309 [ 1 ] <Debug> Application: Configuration parameter 'interserver_http_host' doesn't exist or exists and empty. Will use 'xxxx' as replica host.
2018.03.19 17:17:25.120471 [ 1 ] <Debug> ConfigReloader: Loading config `/etc/clickhouse-server/users.xml'
2018.03.19 17:17:25.125606 [ 1 ] <Warning> ConfigProcessor: Include not found: networks
2018.03.19 17:17:25.125636 [ 1 ] <Warning> ConfigProcessor: Include not found: networks
2018.03.19 17:17:25.126753 [ 1 ] <Information> Application: Loading metadata.
2018.03.19 17:17:25.127259 [ 1 ] <Information> DatabaseOrdinary (default): Total 0 tables.
2018.03.19 17:17:25.127348 [ 1 ] <Information> DatabaseOrdinary (system): Total 0 tables.
2018.03.19 17:17:25.127894 [ 1 ] <Debug> Application: Loaded metadata.
2018.03.19 17:17:25.128699 [ 1 ] <Information> Application: Listening http://[::1]:8123
2018.03.19 17:17:25.128749 [ 1 ] <Information> Application: Listening tcp: [::1]:9000
2018.03.19 17:17:25.128783 [ 1 ] <Information> Application: Listening interserver: [::1]:9009
2018.03.19 17:17:25.128816 [ 1 ] <Information> Application: Listening http://10.xx.xx.136:8123
2018.03.19 17:17:25.128845 [ 1 ] <Information> Application: Listening tcp: 10.xx.xx.136:9000
2018.03.19 17:17:25.128872 [ 1 ] <Information> Application: Listening interserver: 10.xx.xx.136:9009
2018.03.19 17:17:25.129116 [ 1 ] <Information> Application: Ready for connections.
2018.03.19 17:17:27.120687 [ 2 ] <Debug> ConfigReloader: Loading config `/etc/clickhouse-server/config.xml'
2018.03.19 17:17:27.127614 [ 2 ] <Warning> ConfigProcessor: Include not found: clickhouse_remote_servers
2018.03.19 17:17:27.127701 [ 2 ] <Warning> ConfigProcessor: Include not found: clickhouse_compression
客户端连接
clickhouse-client --host=10.xx.xx.136 --port=9000
ClickHouse client version 1.1.54236.
Connecting to 10.xx.xx.136:9000.
Connected to ClickHouse server version 1.1.54236.
:)
:)
:)
:)
:)
:) show tables;
SHOW TABLES
Ok.
0 rows in set. Elapsed: 0.011 sec.
:)
简单操作测试
:) select now()
SELECT now()
┌───────────────now()─┐
│ 2018-03-19 17:22:55 │
└─────────────────────┘
1 rows in set. Elapsed: 0.002 sec.
systemd守护进程服务
/etc/systemd/system/clickhouse.service
[Unit]
Description=clickhouse
After=syslog.target
After=network.target
[Service]
LimitAS=infinity
LimitRSS=infinity
LimitCORE=infinity
LimitNOFILE=65536
User=root
Type=simple
Restart=on-failure
KillMode=control-group
ExecStart=/usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml
RestartSec=10s
[Install]
WantedBy=multi-user.target
性能测试
硬件配置:
- CPU Intel Core Processor (Haswell, no TSX) cores = 8, 2.6GHz, x86_64
- Memory 16G
- Drive SSD in software RAID
ClickHouse列数据库的Golang 驱动
kafka的消费者需要将数据写入到clickhouse数据库中,由于我们的技术栈主要为golang,所以需要一个golang版本的ClickHouse的驱动。本章就重点介绍一个开源的驱动。
关键特性
- 使用原生 ClickHouse tcp client-server 协议
- 兼容 database/sql 库
- 实现了轮训算法的负载均衡
DSN
- username/password - auth credentials
- database - select the current default database
- read_timeout/write_timeout - timeout in second
- no_delay - disable/enable the Nagle Algorithm for tcp socket (default
is 'true' - disable) - alt_hosts - comma separated list of single address host for
load-balancing -
connection_open_strategy - random/in_order (default random).
- random - choose random server from set
- in_order - first live server is choosen in specified order
- block_size - maximum rows in block (default is 1000000). If the rows
are larger then the data will be split into several blocks to send
them to the server - debug - enable debug output (boolean value)
SSL/TLS 参数
- secure - 建立安全连接,默认为false
- skip_verify - 跳过安全认证 默认是true
example
tcp://host1:9000?username=user&password=qwerty&database=clicks&read_timeout=10&write_timeout=20&alt_hosts=host2:9000,host3:9000
支持的数据类型
- UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64
- Float32, Float64
- String
- FixedString(N)
- Date
- DateTime
- Enum
- UUID
- Nullable(T)
- Array(T) (one-dimensional) godoc
Install
go get -u github.com/kshvakov/clickhouse
示例
package main
import (
"database/sql"
"fmt"
"log"
"time"
"github.com/kshvakov/clickhouse"
)
func main() {
connect, err := sql.Open("clickhouse", "tcp://127.0.0.1:9000?debug=true")
if err != nil {
log.Fatal(err)
}
if err := connect.Ping(); err != nil {
if exception, ok := err.(*clickhouse.Exception); ok {
fmt.Printf("[%d] %s \n%s\n", exception.Code, exception.Message, exception.StackTrace)
} else {
fmt.Println(err)
}
return
}
_, err = connect.Exec(`
CREATE TABLE IF NOT EXISTS example (
country_code FixedString(2),
os_id UInt8,
browser_id UInt8,
categories Array(Int16),
action_day Date,
action_time DateTime
) engine=Memory
`)
if err != nil {
log.Fatal(err)
}
var (
tx, _ = connect.Begin()
stmt, _ = tx.Prepare("INSERT INTO example (country_code, os_id, browser_id, categories, action_day, action_time) VALUES (?, ?, ?, ?, ?, ?)")
)
for i := 0; i < 100; i++ {
if _, err := stmt.Exec(
"RU",
10+i,
100+i,
clickhouse.Array([]int16{1, 2, 3}),
time.Now(),
time.Now(),
); err != nil {
log.Fatal(err)
}
}
if err := tx.Commit(); err != nil {
log.Fatal(err)
}
rows, err := connect.Query("SELECT country_code, os_id, browser_id, categories, action_day, action_time FROM example")
if err != nil {
log.Fatal(err)
}
for rows.Next() {
var (
country string
os, browser uint8
categories []int16
actionDay, actionTime time.Time
)
if err := rows.Scan(&country, &os, &browser, &categories, &actionDay, &actionTime); err != nil {
log.Fatal(err)
}
log.Printf("country: %s, os: %d, browser: %d, categories: %v, action_day: %s, action_time: %s", country, os, browser, categories, actionDay, actionTime)
}
if _, err := connect.Exec("DROP TABLE example"); err != nil {
log.Fatal(err)
}
}
总结
后续会讲解clickhouse的go版本的客户端库以及flow项目中clickhouse的使用心得。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。