1

本文是SIGMOD论文解读。Automatic Database Management System Tuning Through Large-scale Machine Learning是CMU教授Andy Pavlo以及其phd学生Dana等人在SIGMOD17发表的智能调参论文,称之为ottertune。该论文引用了另一篇paper“Tuning database configuration parameters with iTuned”发表于VLDB2009。我个人认为从核心idea上并无太大差别。

下面说一下ottertune的流程与核心思想

Let's take an example. I already have history data with two metrics (innodb_pages_reads, innodb_io_reads), three workloads (TPCC, YCSB, wikipedia) and four configrations. So I can get two matrices:

Matrix1 (innodb_pages_reads)

         conf1  conf2  conf3  conf4
TPCC     20     30     40     50
YCSB     100    NULL   300    400
WIKI     50     60     NULL   80

Matrix2 (innodb_io_reads)

         conf1  conf2  conf3  conf4
TPCC     200    300    400    500
YCSB     100    NULL   300    400
WIKI     500    600    NULL   800

The recommendation steps for target wokload (Aliworkload) would like this:

  1. We run current target workload (Xworkload) with conf1 (as the defaut configuration), the 5 metrics is (11, 20, 31, 40, 51). So the similar workload is TPC-C
  2. Take all of the previous data you have for TPC-C and combine it with all of the data collected so far from the current workload. You use this data to train a GP model (again, the configurations are your input matrix and your target objective metric, such as the latency, is your output matrix. Then, starting with a bunch of sample configurations (let's say for now they're randomly generated), use the GP model along with gradient descent to predict the means/variances of the sample points and walk towards the nearest optimum (for latency this would be the nearest minimum). Use an exploration/exploitation tradeoff algorithm like UCB (upper confidence bound, or if using a metric like latency where lower is better, lower confidence bound) to select the next configuration to run. Let's call this conf2. See https://github.com/cmu-db/ott... and https://github.com/cmu-db/ott... for more details.
  3. Install conf2 on the DBMS and observe the workload for some minutes/hours.
  4. Repeat steps 1 - 3 until satisfied with the improvement. Note that in the next iteration of step 1, you will now predict the metrics for TPC-C, YCSB, & Wiki for both conf1 and conf2, so the workload that you first selected as being the most similar may change over the course of the tuning session.

Michael
16 声望19 粉丝