今天我压测了通过流表订阅的形式去写入DolphinDB database分布式表这块的逻辑,streamTable--data-->partitionTable的TPS只有100多点,请帮忙看看能否优化?现在我能想到的优化点有以下两点:(1)优化udf脚本,(2)有没其他性能更好方案(如:单条插入改批量)。
name typeString typeInt comment
---------------- ---------- ------- -------
customer_code SYMBOL 17
reg_code SYMBOL 17
address SYMBOL 17
speed INT 4
order_num SYMBOL 17
item_code SYMBOL 17
base_create_time DATETIME 11
//分区表 t_key_data_zyg
name typeString typeInt comment
---------------- ---------- ------- -------
customer_code SYMBOL 17
reg_code SYMBOL 17
address SYMBOL 17
speed INT 4
production DOUBLE
order_num SYMBOL 17
item_code SYMBOL 17
base_create_time DATETIME 11
def udf(mutable partitionTable, mutable t1, t){
customer_code = exec customer_code from t
item_code = exec item_code from t
regCode = exec reg_code from t
addr = exec address from t
nextTime = exec base_create_time from t
nextSpeed = exec speed from t
nextOrderNum = exec order_num from t
re = select * from partitionTable where reg_code = regCode[0] and address = addr[0] and base_create_time < nextTime[0] order by base_create_time desc limit 1
if ((exec count(*) from re)==0){
t3 = select customer_code,reg_code,address,speed,speed*1 as production,order_num,item_code,base_create_time from t
curTime = exec base_create_time from re
curSpeed = exec speed from re
curOrderNum = exec order_num from re
preProd = exec production from re
t5=select customer_code,reg_code,address,speed,0 as production,order_num,item_code,base_create_time from t
production = preProd[0]
interval = (nextTime[0] - curTime[0]) \ 60
if (nextOrderNum[0] != curOrderNum[0] && interval <= 3) {
production = nextSpeed[0] * interval
if (nextOrderNum[0] == curOrderNum[0] && interval <= 3) {
production = production + ((nextSpeed[0] + curSpeed[0]) \ 2)*interval
if (nextOrderNum[0] != curOrderNum[0] && interval > 3) {
production = nextSpeed[0] * 1
t2 = table(take(customer_code[0],1) as customer_code,regCode[0] as reg_code,take(addr[0], 1) as address,take(nextSpeed[0],1) as speed, take(production, 1) as production, take(nextOrderNum[0], 1) as order_num, take(item_code[0], 1) as item_code, take(nextTime[0],1) as base_create_time)
流表的数据预处理程序要提升性能,主要的思路是将处理的数据从一条条改成一批批处理。DolphinDB Script是一个向量化的编程语言,处理批量数据远比一条条处理要高效。建议: