精华内容
下载资源
问答
  • 本文主要向大家介绍了MySQL数据库之MySQL-tpch 测试工具简要手册 ,通过具体的内容向大家展现,希望对大家学习MySQL数据库有所帮助。tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于...

    本文主要向大家介绍了MySQL数据库之MySQL-tpch 测试工具简要手册 ,通过具体的内容向大家展现,希望对大家学习MySQL数据库有所帮助。

    f0ae15bfe232386ed6d3040fba77c137.png

    tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库、平台和I/O性能,关注查询能力。官网:http://www.tpc.org/tpch下载地址:http://www.tpc.org/tpch/spec/tpch_2_14_3.tgz 或 http://www.tpc.org/tpch/spec/tpch_2_14_3.zip

    1、编译安装

    下载源码包,解压缩,然后:cp makefile.suite makefile

    修改makefile文件中的CC、DATABASE、MACHINE、WORKLOAD等定义:################## CHANGE NAME OF ANSI COMPILER HERE################CC      = gcc# Current values for DATABASE are: INFORMIX, DB2, ORACLE,#                                  SQLSERVER, SYBASE, TDAT (Teradata)# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,#                                  SGI, SUN, U2200, VMS, LINUX, WIN32# Current values for WORKLOAD are:  TPCHDATABASE= MYSQLMACHINE = LINUXWORKLOAD = TPCH

    修改tpcd.h文件,增加几行宏定义:#ifdef MYSQL#define GEN_QUERY_PLAN ""#define START_TRAN "START TRANSACTION"#define END_TRAN "COMMIT"#define SET_OUTPUT ""#define SET_ROWCOUNT "limit %d;\n"#define SET_DBASE "use %s;\n"#endif然后执行make编译,编译完毕后会生成两个可执行文件:dbgen:数据生成工具。在使用InfiniDB官方测试脚本进行测试时,需要用该工具生成tpch相关表数据。qgen:SQL生成工具生成初始化测试数据:[root@imysql tpch]# time ./dbgen -s 50TPC-H Population Generator (Version 2.9.0)Copyright Transaction Processing Performance Council 1994 - 2008

    real    192m43.897suser    37m45.398ssys     19m4.132s

    [root@imysql tpch]# ls -lh *tbl-rw-r--r-- 1 root root 1.2G Sep 21 15:23 customer.tbl-rw-r--r-- 1 root root 1.4G Sep 21 15:23 lineitem.tbl-rw-r--r-- 1 root root 2.2K Sep 21 15:23 nation.tbl-rw-r--r-- 1 root root 317M Sep 21 15:23 orders.tbl-rw-r--r-- 1 root root 504K Sep 21 15:23 partsupp.tbl-rw-r--r-- 1 root root 464K Sep 21 15:23 part.tbl-rw-r--r-- 1 root root  389 Sep 21 15:23 region.tbl-rw-r--r-- 1 root root  69M Sep 21 15:23 supplier.tbldbgen参数 -s 的作用是指定生成测试数据的仓库数,建议基准值设定在100以上,在我的测试环境中,一般都设定为1000。由于源码包中自带的tpch初始化库表脚本并不能完全适用MySQL,需要修改部分代码。先生成测试SQL脚本:[root@imysql tpch]# ./qgen | sed -e 's/\r//' > queries/tpch_queries.sql

    而后用vim打开tpch_queries.sql脚本,进行下面几次全局替换::%s/;\nlimit/ limit/g:%s/limit -1/limit 1/g

    搜索所有类似下面的语句,去掉后面的 (3):l_shipdate <= date '1998-12-01' - interval '106' day (3)=>l_shipdate <= date '1998-12-01' - interval '106' day

    再修改第369行附近:count(o_orderkey)=>count(o_orderkey) as c_count

    修改第376行左右) as c_orders (c_custkey, c_count)=>) as c_orders

    修改第431行附近:drop view revenue0 limit 1;=>drop view revenue0;

    最后把大的查询SQL脚本拆分成23个独立的SQL查询脚本,分别从tpch_01.sql ~ tpch_23.sql。

    2、初始化库表

    tpch提供的数据库表初始化脚本有些小问题,需要进行修改:dss.ddl – DSS库初始化DDL脚本dss.ri  – DSS数据表创建索引、外键脚本dss.ddl脚本需要增加几行:drop database tpch;create database tpch;use tpch;

    dss.ri脚本需要修改几个地方:修改第4行左右:CONNECT TO TPCD;=>Use tpch;

    修改第6~13行,所有的SQL注释符 “--” 后面再加一个空格:-- ALTER TABLE TPCD.REGION DROP PRIMARY KEY;                                                                                        -- ALTER TABLE TPCD.NATION DROP PRIMARY KEY;                                                                                        -- ALTER TABLE TPCD.PART DROP PRIMARY KEY;                                                                                          -- ALTER TABLE TPCD.SUPPLIER DROP PRIMARY KEY;                                                                                      -- ALTER TABLE TPCD.PARTSUPP DROP PRIMARY KEY;                                                                                      -- ALTER TABLE TPCD.ORDERS DROP PRIMARY KEY;                                                                                        -- ALTER TABLE TPCD.LINEITEM DROP PRIMARY KEY;                                                                                      -- ALTER TABLE TPCD.CUSTOMER DROP PRIMARY KEY;

    修改第25行:ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references TPCD.REGION;=>ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references TPCD.REGION(R_REGIONKEY);

    修改第40行:ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references TPCD.NATION;=>ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references TPCD.NATION(N_NATIONKEY);

    修改第55行:ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references TPCD.NATION; =>ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references TPCD.NATION(N_NATIONKEY);

    修改第73行:ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references TPCD.SUPPLIER; =>ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references TPCD.SUPPLIER(S_SUPPKEY);

    修改第78行:ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references TPCD.PART;=>ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references TPCD.PART(P_PARTKEY);

    修改第84行:ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references TPCD.CUSTOMER;=>ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references TPCD.CUSTOMER(C_CUSTKEY);

    修改第90行:ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY)  references TPCD.ORDERS;=>ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY)  references TPCD.ORDERS(O_ORDERKEY);

    修改第96行:TPCD.PARTSUPP;=>TPCD.PARTSUPP(PS_PARTKEY,PS_SUPPKEY);

    另外,由于tpch生成的表名是大写的,需要修改下表名成小写的,因此再增加几行:use tpch;

    alter table CUSTOMER rename to customer ;alter table LINEITEM rename to lineitem ;alter table NATION rename to nation   ;alter table ORDERS rename to orders   ;alter table PART rename to part     ;alter table PARTSUPP rename to partsupp ;alter table REGION rename to region   ;alter table SUPPLIER rename to supplier ;

    3、导入数据测试数据生成了,测试库表也初始化完了,接下来就可以开始导入数据了。需要注意下,如果开启了binlog,在导入前最好先关闭binlog,否则会提示超出max_binlog_cache_size的错误提示,如果不能关闭binlog,则需要把导入文件切分成多个小文件再导入。myqsl -e "LOAD DATA INFILE 'path/dbgen/customer.tbl' INTO TABLE CUSTOMER FIELDS TERMINATED BY '|';"myqsl -e "LOAD DATA INFILE 'path/dbgen/orders.tbl'   INTO TABLE ORDERS   FIELDS TERMINATED BY '|';"myqsl -e "LOAD DATA INFILE 'path/dbgen/lineitem.tbl' INTO TABLE LINEITEM FIELDS TERMINATED BY '|';"myqsl -e "LOAD DATA INFILE 'path/dbgen/nation.tbl'   INTO TABLE NATION   FIELDS TERMINATED BY '|';"myqsl -e "LOAD DATA INFILE 'path/dbgen/partsupp.tbl' INTO TABLE PARTSUPP FIELDS TERMINATED BY '|';"myqsl -e "LOAD DATA INFILE 'path/dbgen/part.tbl'     INTO TABLE PART     FIELDS TERMINATED BY '|';"myqsl -e "LOAD DATA INFILE 'path/dbgen/region.tbl'   INTO TABLE REGION   FIELDS TERMINATED BY '|';"myqsl -e "LOAD DATA INFILE 'path/dbgen/supplier.tbl' INTO TABLE SUPPLIER FIELDS TERMINATED BY '|';"

    4、执行tpch测试接下来就可以进行tpch测试了,逐个执行23个查询SQL脚本即可,每次执行前都要重启下MySQL实例,确保每次的内存缓冲区都是干净的。简单循环测试脚本如下:#!/bin/sh#### 执行tpch OLAP测试#### writed by yejr(http://imysql.com), 2012/12/14## PATH=$PATH:/usr/local/binexport PATH . ~/.bash_profile >/dev/null 2>&1 exec 3>&1 4>&2 1>> tpch-benchmark-olap-`date +'%Y%m%d%H%M%S'`.log 2>&1I=1II=3while [ $I -le $II ]doN=1T=23while [ $N -lt $T ]doif [ $N -lt 10 ] ;thenNN='0'$NelseNN=$Nfiecho "query $NN starting"/etc/init.d/mysql restarttime mysql -f tpch < ./queries/tpch_${NN}.sqlecho "query $NN ended!"N=`expr $N + 1`done I=`expr $I + 1`Done

    附件:tpch初始化、自动化测试脚本压缩包与word手册。

    本文由职坐标整理并发布,希望对同学们学习MySQL有所帮助,更多内容请关注职坐标数据库MySQL数据库频道!

    展开全文
  • TPCH测试脚本

    千次阅读 2017-05-11 12:47:35
    tpch测试脚本
    #!/usr/bin/env python
    # encoding: utf-8
    
    """
    tpch audit run
    
    Options:
      -d             debug mode
      -h, --help     show this help
      -q, --query_program=  program for query streams, default to use mysql
      -r, --root=    root directoy
      -s, --scale=   scale factor (default 1, 1GB)
      -p, --power    power test only
      -t, --throughput  throughput test only
      -u, --update   do update or rf1, rf2
      --test_only     only test
      --db=          database name
    
    """
    
    __author__ = "David Zhang"
    __modifier__ = "tianbaochao"
    __version__ = "$Revision: 1.1 $"
    __date__ = "$Date: 2007/08/08 $"
    __modify_date__ = "$Date: 2016/08/24 $"
    
    import sys
    import getopt
    import re
    import random
    import copy
    from threading import Thread
    import difflib
    import os
    from sets import Set
    import pdb
    import time
    import signal
    import shutil
    import inspect
    import MySQLdb
    import math
    
    # global variables
    #数据库名 must
    tpch_db_name = "tpch"
    
    _debug = 1
    #数据库名 must
    mysock="/var/lib/mysql/mysql.sock"
    mysql_test_dir="/usr/local/mysql/mysql-test"
    mysql_cmd = " cat %s | mysql --sock=" + mysock + " --database=%s --force  -u root -p123456 -h127.0.0.1 -v -v -v" % tpch_db_name
    
    
    # global variables
    script_log_file=""
    run_id=7
    #数据级别,单位G,must
    scale_factor=1
    #tpch根目录
    audit_dir="/data/test/tpch_2_17_0"
    dbgen_dir="/data/test/tpch_2_17_0/dbgen"
    qgen="/data/test/tpch_2_17_0/dbgen/qgen"
    #important!!!! 结果输出目录,每次测试一定要换
    output_dir="/data/test/tpch_2_17_0/output"
    rf_tmp_file_name_template = "/data/test/tpch_2_17_0/dbgen/rf%s_%s.sql" 
    
    rand_seed=8768640832410
    num_streams_scale ={1:2,10:3,30:4,100:5,300:6,1000:7,3000:8,10000:9}
    
    do_power=False
    do_throughput=False
    do_update=False
    do_loader=False
    query_file_template="%s/query_%s_s%s.sql"
    use_mysql_client = False
    use_mysql_client = True
    tpch_conn = None
    test_only=False
    rf_throughput_concurrent=False    # if true start RF and throughput concurrently, or start throughput first then RF, concurrent may be better as query will not use CPU a lot
                                      # this will cause deadlock
    num_seqs=1                        # how many iterations
    num_rf_streames=1                 
    update_only = False              # only do update
    no_cleanup = False               # do not cleanup 
    leave_server_up = False          # leave the server up after the run
    gen_udf_call = False              # true if to use UDF instead of profiling
    
    qtime=[]   # query time (seq, stream_id, qnum, start,end, time)
    rftime=[]  # refresh time (seq, stream_id, rf_id, start, end, time)
    streams_time=[]  # streams time  (seq, stream_id, start, end, duration)
    tpch_power_numbers=[]
    tpch_throughput_numbers=[]
    all_queries={}
    
    # for reporting
    harp_run_time_log=""
    master_err_file_hanle=None
    master_err_file=None
    
    #====Utilties============
    
    class TimedOutExc(Exception):
        def __init__(self, value = "Timed Out"):
            self.value = value
        def __str__(self):
            return repr(self.value)
    
    
    
    def sys_call(cmd):
    #    dbg_print(os.getcwd())
    #    print (cmd)
        dbg_print("%s:%s" % (os.getcwd(),cmd))
        os.system(cmd)
    
    def sys_pipe_call(cmd):
        dbg_print("%s:%s" % (os.getcwd(),cmd))
        return os.popen(cmd).read()               
    #    return os.popen("echo " + cmd).read() 
    #    return cmd
    
    def sys_pipe_call_1(cmd):
        ''' also return the errors '''
    #    time.sleep(0.1)
    #    return cmd
        dbg_print("%s:%s" % (os.getcwd(),cmd))
        return os.popen4(cmd)[1].read()
        #return cmd
    
    def sys_call_env(cmd):
        cmds=cmd.split()
        dbg_print("cmds= %s " % cmds)
        os.spawnv( os.P_WAIT, cmds[0], cmds[1:])
    
    def usage():
        caller_name = sys._getframe(1).f_code.co_name
        #print "caller_name = %s " % caller_name
        if (caller_name=='main'):
            print __doc__
        else:
            print globals()[caller_name].__doc__
        sys.exit(0)                  
    
    def dbg_print(in_str):
        if _debug:
            print ("== " + sys._getframe(1).f_code.co_name + " == " + str(in_str))
    
    def whoami():
        return sys._getframe(1).f_code.co_name
    
    # tpch_
    def tpch_error(s):
            print ("TPCH Error:" + s)
            sys.exit(1)
    
    def get_time():
        return time.time()
    
    def write_log(s,log_file_name=None,timestamp=True,pstdout=False):
        if not log_file_name: pstdout = True  # print the master
        indent_start=4
        if not log_file_name: log_file_name = script_log_file
        #dbg_print("log_file_name=%s" % log_file_name)
        #dbg_print("script_log_file=%s" % script_log_file)
        #indent = "".join(["  " for i in range(len(inspect.stack()))])
        indent = "".rjust((len(inspect.stack())-indent_start)*3)
        ts = timestamp and "%s :" % time.strftime('%x %X') or ""
        line = ts + indent + s + "\n"
        open(log_file_name,"a+").write(line)
        if pstdout: sys.stdout.write(line)
        if not log_file_name: dbg_print(line)
    
    
    
    import distutils.dir_util
    
    
    def setup_tpch():
        global audit_dir, dbgen_dir, qgen, output_dir, script_log_file, harp_run_time_log
    
        if not audit_dir: raise AssertionError, "AUDIT_DIR is not defined and not in a view\n"
    
        if not dbgen_dir: dbgen_dir="%s/dbgen" % audit_dir
    
        if not output_dir: output_dir = "%s/results/r%s" % (audit_dir, run_id)
    
        try:
            shutil.rmtree(output_dir)
        except: pass
        os.mkdir(output_dir)
    
        qgen = "%s/qgen" % dbgen_dir
        # set this so qgen can find the query template
        os.environ['DSS_QUERY'] = '%s/queries' % (dbgen_dir)
        os.environ['DSS_PATH'] = output_dir    # this is where the update file is stored
        os.environ['DSS_CONFIG'] = dbgen_dir
    
        script_log_file = "%s/tpch.log" % (output_dir)
        harp_run_time_log = "%s/harp_run_time.log" % (output_dir)
    
        # setup_mysql()   # setup mysql related
    
    
    def compute_result(power_run,seq,num_streams,duration,result):
        if power_run:
            run_time = [x[-1] for x in qtime if x[0] == seq and x[1] == 0 ]
            run_time.extend([x[-1] for x in rftime if x[0] == seq and x[1] == 0 ])
        else:
            run_time = [x[-1] for x in qtime if x[0] == seq and x[1] != 0 ]
            run_time.extend([x[-1] for x in rftime if x[0] == seq and x[1] != 0 ])
    
        num_queries_and_rfs = len(run_time)
        write_log("%s Seq %s num queries and rfs = %s" % (power_run and "Power" or "Throughput", seq,num_queries_and_rfs), pstdout=True)
        geo_mean = math.exp(sum([math.log(t) for t in run_time])*1.0/float(num_queries_and_rfs))
        arith_mean = sum([t for t in run_time])/float(num_queries_and_rfs)
        if power_run: result_number = 3600.0* geo_mean *scale_factor
        else: result_number = (num_streams*22.0*3600)/duration*scale_factor
        result.append((result_number, geo_mean, arith_mean))
    
    def print_results():
        ''' print all the results '''
        qtime.sort(lambda x,y: int(float(x[0]*10000+x[1]*100+x[2])-float(y[0]*10000+y[1]*100+y[2])))
    
        write_log("TPCH Detailed Results")
        for i in range(num_seqs):   # seq number
            tmp_qtime = [list(x) for x in qtime if x[0]==i+1]
            for q in tmp_qtime:
                write_log("Seq %s, Streams %s, Query %s: start_time= %s, end_time = %s, duration = %s " % (tuple(q)))
            tmp_qtime=([list(x) for x in rftime if x[0] ==i+1])
            for q in tmp_qtime:
                write_log("Seq %s, Streams %s, RF %s: start_time = %s, end_time=%s, duration = %s " % (tuple(q)))
    
        write_log("TPCH Streams Results")
        streams_time.sort(lambda x,y: int(float(x[0]*100+x[1])-float(y[0]*100+y[1])))
        for x in streams_time:
    #        write_log("Seq %s, Streams %s: start_time = %s , end_time = %s, duration = %s " % x)
            (seq_id, stream_id, start_time, end_time, duration) = x
            stream_rand_seed = rand_seed and long(rand_seed) + stream_id or 'default'
            if stream_id<100: tmp_x = [seq_id, "Query Streams %s" % stream_id, "rand_seed = %s" % stream_rand_seed]  
            else: tmp_x = [seq_id, 'RF throughput streams','']   # RF is 1001 for sorting purpose
            tmp_x.append(conv_to_date(start_time))
            tmp_x.append(conv_to_date(end_time))
            tmp_x.append(duration)
            write_log("Seq %s, %s: %s, start_time = %s , end_time = %s, duration = %s " % tuple(tmp_x))
    
        # aggregated results
        if do_power:
            write_log("TPCH Power Tests Results", pstdout=True)
            for i in range(num_seqs):
                write_log("  Power  Seq %s: (Power Metric, GeoMean, ArithMean) = %s  " % (i+1, tpch_power_numbers[i]), pstdout=True)
        if do_throughput:
            write_log("TPCH Throughput Tests Results",pstdout=True)
            for i in range(num_seqs):
                write_log("  ThroughPut Seq %s: (Throughput Metric, GeoMean, ArithMean) = %s " % (i+1, tpch_throughput_numbers[i]), pstdout=True)
        if do_power and do_throughput:
            write_log("TPCH QPH Results",pstdout=True)
            for i in range(num_seqs):
                qph = math.exp((math.log(tpch_power_numbers[i][0]) + math.log(tpch_throughput_numbers[i][0]))*0.5)
                write_log("  TPCH QPH for Seq %s  = %s " % (i+1, qph), pstdout=True)
    
        if server_is_up():
            if leave_server_up: write_log("  Leave MySQL server up", pstdout=True)
            else: shutdown_server()
        else:
            write_log("  MySQL Server is down during the Run", pstdout=True)
            sys.exit(9);
    
    def run_tpch_all():
        #load_tpch()   # load the data
        write_log("Start TPC-H Benchmark for Run id: %s " % run_id)
        # if do_update:
        #     gen_rfs()
        for i in range(num_seqs):
            if do_power: run_tpch_power_test(i+1)
            if do_throughput: run_tpch_throughput_test(i+1)
            #start_up_server_if_needed()   # restart server
        print_results()
        write_log("End TPC-H Benchmark for Run id: %s " % run_id)
    #    pdb.set_trace()
    
    def parse_query_file(query_file):
        sys_call("sed -i 's/(3)//' %s" % (query_file))
        # sys_call("sed -i 's/limit -1/limit 1/' %s" % (query_file))
        sys_call("sed -i 's/count(o_orderkey)/count(o_orderkey) as c_count/' %s" % (query_file))
        sys_call("sed -i 's/) as c_orders (c_custkey, c_count)/) as c_orders/'  %s" % (query_file))
        sys_call("sed -i 's/drop view revenue0 limit 1;/) as c_orders/' %s" % (query_file))
    
    
        lines = [l.rstrip() for l in open(query_file,"r").readlines()]
        query_begin_re = re.compile("TPC-H/TPC-R .* Query \(Q([0-9]+)\)")
        query_end_re = re.compile("limit ([\-0-9]+);")
        # query_end_re = re.compile("^set rowcount ([\-0-9]+)")
        dbg_print("query_file = %s " % query_file)
        first_l=lines[0]
        query_begin = False
        queries=[]
        for l in lines:
            dbg_print("l= %s" % l)
            m = query_begin_re.search(l)
            if m:
                dbg_print("begin l= %s " % l)
                query_begin = True
                cur_qry=[]
                qry_name=m.group(1)
                cur_qry.append(first_l)
            if query_begin:
                m1 = query_end_re.search(l)
                if m1:
                    query_begin = False
                    row_count = int(m1.group(1))
                    if row_count > 0:
                        last_line = cur_qry[-1]
                        cur_qry[-1] =  last_line.split(";")[0] + " limit %s;" % row_count
                    queries.append((int(qry_name),"\n".join(cur_qry)))
                else:  cur_qry.append(l)
    
        if len(queries) != 22:
            print "queries = %s " % queries
            raise AssertionError, "There should be 22 queries,%s" % len(queries)
        return queries
    
    def conv_to_date(t):
        return time.strftime("%X %x",time.localtime(t))
    
    def open_connection():
        if (use_mysql_client): return None
        global tpch_conn
        tpch_conn = MySQLdb.connect (host="localhost", user="root", passwd="",db=tpch_db_name,unix_socket=mysock)
        if not tpch_conn:  raise AssertionError, "Connection failure"
        else: return tpch_conn
    
    def close_connection():
        if (use_mysql_client): return
        tpch_conn.close()
    
    def execute_one_query(seq, stream_id, qnum, q):
        # to test rf, do not run any query     e they do not pass
    
        if (test_only):
            time.sleep(0.1)
            return "Fake results\n"
        out_file = "%s/query_%s_s%s_q%s.log" % (output_dir,seq,stream_id,qnum)
        if (use_mysql_client):
            in_file = "%s/query_%s_s%s_q%s.sql" % (output_dir,seq,stream_id,qnum)
            open(in_file,"w").write(q)
            output = sys_pipe_call_1(mysql_cmd % in_file)
            open(out_file,"w").write(output)
        else:
            cursor = tpch_conn.cursor()
            cursor.execute(q)
            result = cursor.fetchall()
            cursor.close()
            output = "\n".join(["|".join([str(f) for f in r]) for r in result])
            open(out_file,"w").write(output)
        return output
    
    
    def run_query(seq, stream_id):
        ''' this is a wrapper. For now use mysql client. Later on can be a C client for query execution. Need to time each queries
            For now, satisfy the requirement to run all queries in a single session. Will try to use python connection
            may need to disable logging for throughput run
        '''
        local_log = "%s/query_%s_s%s.log" % (output_dir, seq, stream_id)
        dbg_print("local_log = %s" % local_log)
        line_header="Query Stream"
        queries=all_queries["%s_%s" % (seq,stream_id)]
        global qtime
        open_connection()
        for (qnum, q) in queries:
            write_log("%s %s seq %s qnum %s begin" % (line_header, stream_id, seq, qnum))
            write_log("%s %s seq %s qnum %s begin" % (line_header, stream_id, seq, qnum), local_log)
            start_time=get_time()
            output = execute_one_query(seq, stream_id, qnum, q)
            dbg_print("output = %s" % output)
            end_time = get_time()
            duration = end_time - start_time
            qtime.append((seq, stream_id, qnum, start_time, end_time, duration))
            write_log("%s seq %s stream %s qnum %s start time %s" % (line_header, seq, stream_id, qnum, start_time))
            write_log("%s seq %s stream %s qnum %s start time %s" % (line_header, seq, stream_id, qnum, start_time), local_log)
            open(local_log,"a+").write(output)
            write_log("%s seq %s stream %s qnum %s end time %s" % (line_header, seq, stream_id, qnum, end_time))
            write_log("%s seq %s stream %s qnum %s end time %s" % (line_header, seq, stream_id, qnum, end_time), local_log)
            write_log("%s seq %s stream %s qnum %s duration %s" % (line_header, seq, stream_id, qnum, duration))
            write_log("%s seq %s stream %s qnum %s duration %s" % (line_header, seq, stream_id, qnum, duration), local_log)
        close_connection()
    
    
    def run_rf(rf_id, seq, stream_id):
        ''' oracle load the update in a temp table and do insert as select. For now, just get the file and run it.
            insert into orders, lineitems (o_orderkey) '''
        global rftime
        if test_only:
            rftime.append((seq, stream_id, rf_id, 1, 2, 3))
        if not do_update:
            #rftime.append((seq, stream_id, rf_id, 101, 10))  # fake it
            return
        dbg_print("begin run_rf %s " % rf_id)
        tmp_file = rf_tmp_file_name_template % (rf_id,stream_id)
        local_log = "%s/rf%s_%s_s%s.log" % (output_dir, rf_id, seq, stream_id)
        line_header="RF %s" % rf_id
        write_log("%s seq %s stream %s begin" % (line_header, seq, stream_id))
        write_log("%s seq %s stream %s begin" % (line_header, seq, stream_id), local_log)
        start_time=get_time()
    
        output = sys_pipe_call_1(mysql_cmd % tmp_file)
    
        # if not test_only: output = sys_pipe_call_1(mysql_cmd % tmp_file)
        # else: output = "test only"
        #os.remove(tmp_file)
        end_time = get_time()
        duration = end_time - start_time
        write_log("%s seq %s stream %s start time %s" % (line_header, seq, stream_id, start_time), local_log)
        write_log("%s seq %s stream %s start time %s" % (line_header, seq, stream_id, start_time))
        # no need to print out result for rf, 
        open(local_log,"a+").write(output)
        write_log("%s seq %s stream %s end time %s" % (line_header, seq, stream_id, end_time))
        write_log("%s seq %s stream %s end time %s" % (line_header, seq, stream_id, end_time), local_log)
        write_log("%s seq %s stream %s duration %s" % (line_header, seq, stream_id, duration))
        write_log("%s seq %s stream %s duration %s" % (line_header, seq, stream_id, duration), local_log)
        rftime.append((seq, stream_id, rf_id, start_time, end_time, duration))
    
    def run_qgen(stream_id, seq):
        query_parameter_file="%s/qp_%s_s%s.log" % (output_dir, seq, stream_id)
        query_file= query_file_template % (output_dir, seq, stream_id)
        dbg_print("run_qgen begin")
        try:
            saved_cur_dir = os.getcwd()
        except OSError:
            saved_cur_dir=audit_dir
        os.chdir(dbgen_dir)
        if rand_seed:
            stream_rand_seed = long(rand_seed) + stream_id
            write_log("Generates query template file for streams %s with seed: %s " % (stream_id, stream_rand_seed))
            sys_call("%s -c -r %s -p %s -s %s -l %s > %s" % (qgen, stream_rand_seed, stream_id, scale_factor, query_parameter_file, query_file))
        else:
            write_log("Generates query template file for streams %s with seed: %s " % (stream_id, 'default'))
            sys_call("%s -c -d -p %s -s %s -l %s > %s" % (qgen, stream_id, scale_factor, query_parameter_file, query_file))
        line_header="Query Stream"
        queries = parse_query_file(query_file)
        global all_queries
        all_queries["%s_%s" % (seq,stream_id)] = queries
        # result files are in dbgen_dir/orders.tbl.u1 lineitem.tbl.u1. delete.1
        os.chdir(saved_cur_dir)
        dbg_print("run_qgen end")
    
    def run_tpch_power_test(seq=1):
        stream_id = 0   # power_test is stream 0
        global streams_time
        write_log("Start TPC-H Power test Run id: %s Seq: %s" % (run_id, seq))
        run_qgen(stream_id, seq) 
        stream_start_time =get_time()
        run_rf(1,seq, stream_id)
        run_query(seq, stream_id)
        run_rf(2, seq, stream_id)
        stream_end_time = get_time()
        stream_duration = stream_end_time - stream_start_time
        streams_time.append((seq, stream_id, stream_start_time, stream_end_time, stream_duration))
        write_log("End TPC-H Power test Run id: %s Seq: %s" % (run_id, seq))
        #dbg_print("qtime = %s " % qtime)
        global tpch_power_numbers
        compute_result(True, seq, 1, 0, tpch_power_numbers)
    
    class throughput_stream(Thread):
        (rp, wp) = os.pipe()
        def __init__ (self, seq, stream_id, stream_type='query'):
            Thread.__init__(self)
            self.seq = seq
            self.stream_id = stream_id
            self.stream_type=stream_type
        def run(self):
            global streams_time
            if self.stream_type == 'query':
                stream_start_time =get_time()
                run_query(self.seq, self.stream_id)
                stream_end_time = get_time()
                stream_duration = stream_end_time - stream_start_time
                streams_time.append((self.seq, self.stream_id, stream_start_time, stream_end_time, stream_duration))
                os.write(self.wp,"1")
            else:
                for i in range(num_streams_scale[scale_factor]):
                    os.read(self.rp,1)
                write_log("RF unblocking")
                stream_start_time =get_time()
                for i in range(num_streams_scale[scale_factor]):
                    run_rf(1,self.seq, i+1)
                    run_rf(2, self.seq, i+1)
                stream_end_time = get_time()
                stream_duration = stream_end_time - stream_start_time
                streams_time.append((self.seq, self.stream_id, stream_start_time, stream_end_time, stream_duration))
    
    def run_tpch_throughput_test(seq=1):
        ''' number of query streams run concurrently with one refresh stream. sequential run rf1, rf2 for all the streams
            Oracle waited all the query streams to finish before start rf.
            But MS start rf same time as query streams'''
        write_log("Start TPC-H Through Put test Run id: %s Seq: %s" % (run_id, seq))
        num_streams = num_streams_scale[scale_factor]
        t_streams=[]
        for s in range(num_streams):
            run_qgen(s+1, seq)
            t_streams.append(throughput_stream(seq, s+1, 'query'))
        #for s in range(num_streams):
        t_streams.append(throughput_stream(seq, 1001, 'rf'))
        #c = raw_input("continue? (Y/N)").upper()
        throughput_start_time=get_time()
        if (rf_throughput_concurrent):
            for t in t_streams: t.start()    
            for t in t_streams: t.join()
        else:
            global streams_time
            stream_start_time =get_time()
            for t in t_streams: t.start()
            for t in t_streams: t.join()
    
            #for t in t_streams[0:num_streams]: t.start()   # start the query first
            #for t in t_streams[0:num_streams]: t.join()
            #for s in range(num_streams):
                #t_streams[num_streams + s].start()   # start the rf last
                #t_streams[num_streams + s].join()
            #stream_end_time = get_time()
            #stream_duration = stream_end_time - stream_start_time
            #streams_time.append((seq, 1001, stream_start_time, stream_end_time, stream_duration))
        throughput_end_time=get_time()
        throughput_duration = throughput_end_time - throughput_start_time
        global tpch_throughput_numbers
        compute_result(False, seq, num_streams, throughput_duration,tpch_throughput_numbers)
        write_log("End TPC-H Through Put test Run id: %s, Seq: %s, Scale: %s, Number of Streams: %s" % (run_id, seq, scale_factor, num_streams))
        write_log("Through Put duration  = %s" % (throughput_duration)) 
    
    #---------------------------
    def server_is_up():
        ret = sys_pipe_call_1("echo exit | mysql --sock=%s" % mysock)
        #print "ret = %s " % ret
    #    return True
        return not re.compile("Can't connect to local MySQL server through socket").search(ret)
    
    
    def start_up_server_if_needed():
        # test if the server is up
        #if test_only: return  # // dzhang_test 
        if not server_is_up():
            write_log ("Cannot connect to MySQL server. Trying to start mysqld!", pstdout=True)
            global master_err_file_handle, master_err_file
            #master_err_file = "%s/var/log/master.err" % mysql_test_dir
            master_err_file = "%s/master.err" % output_dir
            #c2_config_file = "%s/c2_config.cnf" % audit_dir   # enable this in real audit
            c2_config_file = "%s/c2_config.cnf" % mysql_test_dir
            # work around bug 1687, remove log and control files
            #c2_log_dir="/log/databases/tpch"   # this will be defined in c2_config_file
            #sys_call("rm %s/*" % c2_log_dir)
            sys_call ("%s/../sql/mysqld  --core-file --basedir=%s/.. --datadir=%s/var/master-data --language=%s/../sql/share/english --c2db_event_log_level=ERROR --socket=%s --c2db-config-file=%s>& %s &"
                  % (mysql_test_dir,mysql_test_dir,mysql_test_dir, mysql_test_dir, mysock, c2_config_file, master_err_file ))
            max_wait_sec = 10
            for i in range(max_wait_sec+1):
                if server_is_up():
                    write_log("Mysqld started successfully on sock %s" % mysock,pstdout = True)
                    break
                time.sleep(2)
            if (i == max_wait_sec): raise AssertionError, "Time out waiting for mysql server to start\n"
            master_err_file_handle = open(master_err_file)
            if not master_err_file_handle: raise AssertionError, "master.err is not created\n"
        else:
            write_log("MySQL server is up. Use the existing one and continue", pstdout=True)
    
    def shutdown_server():
        if not server_is_up(): return
        dbg_print("Begin shutdown sever")
        sys_call("mysqladmin -u root -p123456 -h127.0.0.1 -S %s shutdown" % mysock)
        max_wait_sec = 20
        for i in range(max_wait_sec+1):
            if not server_is_up():
                write_log("Mysqld is shutdown successfully on sock %s" % mysock, pstdout=True)
                break
            time.sleep(2)
            if (i == max_wait_sec): raise AssertionError, "Time out waiting for mysql server to shutdown\n"
    
    def main(argv):
        ''' may also need to specify seed, mysock. Also may need to take care of loading
        '''
        # default 
        tpch_tables=('customer', 'supplier', 'nation', 'region', 'part', 'partsupp', 'orders', 'lineitem')
        global _debug, scale_factor, do_power, do_throughput, do_update, tpch_db_name, test_only, do_loader, run_id, rand_seed, num_seqs, update_only, no_cleanup, leave_server_up
    
        os.environ['TZ'] = 'US/Pacific'
        time.tzset()
    
        try:                                
           opts, args = getopt.gnu_getopt(argv, "hdq:s:r:ptule:", ["help", "debug","query_program=", "scale=", "root=", "power", "throughput", "update","db=","test_only","loader","run_id=","rand_seed=","iter=","gen_rfs","update_only","no_cleanup","leave_server_up"])
        except getopt.GetoptError:          
            usage()                         
            sys.exit(2)                     
        for opt, arg in opts:               
            if opt in ("-h", "--help"):     
                usage()                     
                sys.exit(0)                  
            elif opt == '-d':               
                _debug = 1                  
            elif opt in ("-q", "--query_program"):
                query_exec = arg
            elif opt in ("-s", "--scale"):
                scale_factor = int(arg)
            elif opt in ("-r", "--root"):
                audit_dir= arg
            elif opt in ("-p", "--power"):
                do_power = True
            elif opt in ("-t", "--throughput"):
                do_throughput = True
            elif opt in ("-u", "--update"):
                do_update = True
            elif opt in ("-l", "--loader"):
                do_loader = True
                #print "do update"
            elif opt in ("-e", "--rand_seed"):
                rand_seed = int(arg)
            elif opt in ("--test_only"):
                test_only = True
            elif opt in ("--db"):
                tpch_db_name = arg
            elif opt in ("--run_id"):
                run_id = arg
            elif opt in ("--iter"):
                num_seqs = int(arg)
            elif opt in ("--update_only"):
                update_only = True
                do_update = True
            elif opt in ("--no_cleanup"):
                no_cleanup = True
            elif opt in ("--leave_server_up"):
                leave_server_up = True
        setup_tpch()
        run_tpch_all()
    
    if __name__ == "__main__":
        main(sys.argv[1:])
    
    
    
    展开全文
  • tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库、...

    tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库、平台和I/O性能,关注查询能力。

    官网:http://www.tpc.org/tpch

    下载地址:http://www.tpc.org/tpch/spec/tpch_2_14_3.tgz

    http://www.tpc.org/tpch/spec/tpch_2_14_3.zip

    1、编译安装

    下载源码包,解压缩,然后:

    cp makefile.suite makefile

    修改makefile文件中的CC、DATABASE、MACHINE、WORKLOAD等定义:

    ################

    ## CHANGE NAME OF ANSI COMPILER HERE

    ################

    CC = gcc

    # Current values for DATABASE are: INFORMIX, DB2, ORACLE,

    # SQLSERVER, SYBASE, TDAT (Teradata)

    # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,

    # SGI, SUN, U2200, VMS, LINUX, WIN32

    # Current values for WORKLOAD are: TPCH

    DATABASE= MYSQL

    MACHINE = LINUX

    WORKLOAD = TPCH

    修改tpcd.h文件,增加几行宏定义:

    #ifdef MYSQL

    #define GEN_QUERY_PLAN ""

    #define START_TRAN "START TRANSACTION"

    #define END_TRAN "COMMIT"

    #define SET_OUTPUT ""

    #define SET_ROWCOUNT "limit %d;\n"

    #define SET_DBASE "use %s;\n"

    #endif

    然后执行make编译,编译完毕后会生成两个可执行文件:

    •dbgen:数据生成工具。在使用InfiniDB官方测试脚本进行测试时,需要用该工具生成tpch相关表数据。

    •qgen:SQL生成工具

    生成初始化测试数据:

    [root@imysql tpch]# time ./dbgen -s 50

    TPC-H Population Generator (Version 2.9.0)

    Copyright Transaction Processing Performance Council 1994 - 2008

    real 192m43.897s

    user 37m45.398s

    sys 19m4.132s

    [root@imysql tpch]# ls -lh *tbl

    -rw-r--r-- 1 root root 1.2G Sep 21 15:23 customer.tbl

    -rw-r--r-- 1 root root 1.4G Sep 21 15:23 lineitem.tbl

    -rw-r--r-- 1 root root 2.2K Sep 21 15:23 nation.tbl

    -rw-r--r-- 1 root root 317M Sep 21 15:23 orders.tbl

    -rw-r--r-- 1 root root 504K Sep 21 15:23 partsupp.tbl

    -rw-r--r-- 1 root root 464K Sep 21 15:23 part.tbl

    -rw-r--r-- 1 root root 389 Sep 21 15:23 region.tbl

    -rw-r--r-- 1 root root 69M Sep 21 15:23 supplier.tbl

    dbgen参数 -s 的作用是指定生成测试数据的仓库数,建议基准值设定在100以上,在我的测试环境中,一般都设定为1000。

    由于源码包中自带的tpch初始化库表脚本并不能完全适用MySQL,需要修改部分代码。

    先生成测试SQL脚本:

    [root@imysql tpch]# ./qgen | sed -e 's/\r//' > queries/tpch_queries.sql

    而后用vim打开tpch_queries.sql脚本,进行下面几次全局替换:

    :%s/;\nlimit/ limit/g

    :%s/limit -1/limit 1/g

    搜索所有类似下面的语句,去掉后面的 (3):

    l_shipdate <= date '1998-12-01' - interval '106' day (3)

    =>

    l_shipdate <= date '1998-12-01' - interval '106' day

    修改第369行附近:

    count(o_orderkey)

    =>

    count(o_orderkey) as c_count

    修改第376行附近:

    ) as c_orders (c_custkey, c_count)

    =>

    ) as c_orders

    修改第431行附近:

    drop view revenue0 limit 1;

    =>

    drop view revenue0;

    最后把大的查询SQL脚本拆分成23个独立的SQL查询脚本,分别从tpch_01.sql ~ tpch_23.sql。

    2、初始化库表

    tpch提供的数据库表初始化脚本有些小问题,需要进行修改:

    dss.ddl – DSS库初始化DDL脚本

    dss.ri – DSS数据表创建索引、外键脚本

    dss.ddl脚本需要增加几行:

    drop database tpch;

    create database tpch;

    use tpch;

    dss.ri脚本需要修改几个地方:

    修改第4行左右:

    CONNECT TO TPCD;

    =>

    Use tpch;

    修改第6~13行,所有的SQL注释符 “–” 后面再加一个空格:

    -- ALTER TABLE TPCD.REGION DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.NATION DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.PART DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.SUPPLIER DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.PARTSUPP DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.ORDERS DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.LINEITEM DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.CUSTOMER DROP PRIMARY KEY;

    修改第25行:

    ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references TPCD.REGION;

    =>

    ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references TPCD.REGION(R_REGIONKEY);

    修改第40行:

    ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references TPCD.NATION;

    =>

    ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references TPCD.NATION(N_NATIONKEY);

    修改第55行:

    ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references TPCD.NATION;

    =>

    ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references TPCD.NATION(N_NATIONKEY);

    修改第73行:

    ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references TPCD.SUPPLIER;

    =>

    ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references TPCD.SUPPLIER(S_SUPPKEY);

    修改第78行:

    ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references TPCD.PART;

    =>

    ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references TPCD.PART(P_PARTKEY);

    修改第84行:

    ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references TPCD.CUSTOMER;

    =>

    ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references TPCD.CUSTOMER(C_CUSTKEY);

    修改第90行:

    ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY) references TPCD.ORDERS;

    =>

    ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY) references TPCD.ORDERS(O_ORDERKEY);

    修改第96行:

    TPCD.PARTSUPP;

    =>

    TPCD.PARTSUPP(PS_PARTKEY,PS_SUPPKEY);

    另外,由于tpch生成的表名是大写的,需要修改下表名成小写的,因此再增加几行:

    use tpch;

    alter table CUSTOMERrename to customer ;

    alter table LINEITEMrename to lineitem ;

    alter table NATIONrename to nation ;

    alter table ORDERSrename to orders ;

    alter table PARTrename to part ;

    alter table PARTSUPPrename to partsupp ;

    alter table REGIONrename to region ;

    alter table SUPPLIERrename to supplier ;

    3、导入数据

    测试数据生成了,测试库表也初始化完了,接下来就可以开始导入数据了。

    需要注意下,如果开启了binlog,在导入前最好先关闭binlog,否则会提示超出max_binlog_cache_size的错误提示,如果不能关闭binlog,则需要把导入文件切分成多个小文件再导入。

    myqsl -e "LOAD DATA INFILE 'path/dbgen/customer.tbl' INTO TABLE CUSTOMER FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/orders.tbl' INTO TABLE ORDERS FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/lineitem.tbl' INTO TABLE LINEITEM FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/nation.tbl' INTO TABLE NATION FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/partsupp.tbl' INTO TABLE PARTSUPP FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/part.tbl' INTO TABLE PART FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/region.tbl' INTO TABLE REGION FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/supplier.tbl' INTO TABLE SUPPLIER FIELDS TERMINATED BY '|';"

    4、执行tpch测试

    接下来就可以进行tpch测试了,逐个执行23个查询SQL脚本即可,每次执行前都要重启下MySQL实例,确保每次的内存缓冲区都是干净的。

    简单循环测试脚本如下:

    #!/bin/sh

    ##

    ## 执行tpch OLAP测试

    ##

    ## writed by yejr(http://imysql.com), 2012/12/14

    ##

    PATH=$PATH:/usr/local/bin

    export PATH

    . ~/.bash_profile > /dev/null 2>&1

    exec 3>&1 4>&2 1>> tpch-benchmark-olap-`date +'%Y%m%d%H%M%S'`.log 2>&1

    I=1

    II=3

    while [ $I -le $II ]

    do

    N=1

    T=23

    while [ $N -lt $T ]

    do

    if [ $N -lt 10 ] ; then

    NN='0'$N

    else

    NN=$N

    fi

    echo "query $NN starting"

    /etc/init.d/mysql restart

    time mysql -f tpch < ./queries/tpch_${NN}.sql

    echo "query $NN ended!"

    N=`expr $N + 1`

    done

    I=`expr $I + 1`

    Done

    附件:tpch初始化、自动化测试脚本压缩包。

    备注:本文档部分参考古雷、王洪权整理的资料,感谢二位

    e35bfe3110d794eaf0b627262912e830.gif

    觉得文章有用?立即:

    和朋友一起 共学习 共进步!

    猜想失败,您看看下面的文章有用吗?

    展开全文
  • tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库、...

    tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库、平台和I/O性能,关注查询能力。

    官网:http://www.tpc.org/tpch

    下载地址:http://www.tpc.org/tpch/spec/tpch_2_14_3.tgz 或 http://www.tpc.org/tpch/spec/tpch_2_14_3.zip

    1、编译安装

    下载源码包,解压缩,然后:

    cp makefile.suite makefile

    修改makefile文件中的CC、DATABASE、MACHINE、WORKLOAD等定义:

    ################

    ## CHANGE NAME OF ANSI COMPILER HERE

    ################

    CC      = gcc

    # Current values for DATABASE are: INFORMIX, DB2, ORACLE,

    #                                  SQLSERVER, SYBASE, TDAT (Teradata)

    # Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,

    #                                  SGI, SUN, U2200, VMS, LINUX, WIN32

    # Current values for WORKLOAD are:  TPCH

    DATABASE= MYSQL

    MACHINE = LINUX

    WORKLOAD = TPCH

    修改tpcd.h文件,增加几行宏定义:

    #ifdef MYSQL

    #define GEN_QUERY_PLAN ""

    #define START_TRAN "START TRANSACTION"

    #define END_TRAN "COMMIT"

    #define SET_OUTPUT ""

    #define SET_ROWCOUNT "limit %d;\n"

    #define SET_DBASE "use %s;\n"

    #endif

    然后执行make编译,编译完毕后会生成两个可执行文件:

    dbgen:数据生成工具。在使用InfiniDB官方测试脚本进行测试时,需要用该工具生成tpch相关表数据。

    qgen:SQL生成工具

    生成初始化测试数据:

    [root@imysql tpch]# time ./dbgen -s 50

    TPC-H Population Generator (Version 2.9.0)

    Copyright Transaction Processing Performance Council 1994 - 2008

    real    192m43.897s

    user    37m45.398s

    sys     19m4.132s

    [root@imysql tpch]# ls -lh *tbl

    -rw-r--r-- 1 root root 1.2G Sep 21 15:23 customer.tbl

    -rw-r--r-- 1 root root 1.4G Sep 21 15:23 lineitem.tbl

    -rw-r--r-- 1 root root 2.2K Sep 21 15:23 nation.tbl

    -rw-r--r-- 1 root root 317M Sep 21 15:23 orders.tbl

    -rw-r--r-- 1 root root 504K Sep 21 15:23 partsupp.tbl

    -rw-r--r-- 1 root root 464K Sep 21 15:23 part.tbl

    -rw-r--r-- 1 root root  389 Sep 21 15:23 region.tbl

    -rw-r--r-- 1 root root  69M Sep 21 15:23 supplier.tbl

    dbgen参数 -s 的作用是指定生成测试数据的仓库数,建议基准值设定在100以上,在我的测试环境中,一般都设定为1000。

    由于源码包中自带的tpch初始化库表脚本并不能完全适用MySQL,需要修改部分代码。

    先生成测试SQL脚本:

    [root@imysql tpch]# ./qgen | sed -e 's/\r//' > queries/tpch_queries.sql

    而后用vim打开tpch_queries.sql脚本,进行下面几次全局替换:

    :%s/;\nlimit/ limit/g

    :%s/limit -1/limit 1/g

    搜索所有类似下面的语句,去掉后面的 (3):

    l_shipdate <= date '1998-12-01' - interval '106' day (3)

    =>

    l_shipdate <= date '1998-12-01' - interval '106' day

    再修改第369行附近:

    count(o_orderkey)

    =>

    count(o_orderkey) as c_count

    修改第376行左右

    ) as c_orders (c_custkey, c_count)

    =>

    ) as c_orders

    修改第431行附近:

    drop view revenue0 limit 1;

    =>

    drop view revenue0;

    最后把大的查询SQL脚本拆分成23个独立的SQL查询脚本,分别从tpch_01.sql ~ tpch_23.sql。

    2、初始化库表

    tpch提供的数据库表初始化脚本有些小问题,需要进行修改:

    dss.ddl – DSS库初始化DDL脚本

    dss.ri  – DSS数据表创建索引、外键脚本

    dss.ddl脚本需要增加几行:

    drop database tpch;

    create database tpch;

    use tpch;

    dss.ri脚本需要修改几个地方:

    修改第4行左右:

    CONNECT TO TPCD;

    =>

    Use tpch;

    修改第6~13行,所有的SQL注释符 “--” 后面再加一个空格:

    -- ALTER TABLE TPCD.REGION DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.NATION DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.PART DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.SUPPLIER DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.PARTSUPP DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.ORDERS DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.LINEITEM DROP PRIMARY KEY;

    -- ALTER TABLE TPCD.CUSTOMER DROP PRIMARY KEY;

    修改第25行:

    ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references TPCD.REGION;

    =>

    ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references TPCD.REGION(R_REGIONKEY);

    修改第40行:

    ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references TPCD.NATION;

    =>

    ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references TPCD.NATION(N_NATIONKEY);

    修改第55行:

    ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references TPCD.NATION;

    =>

    ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references TPCD.NATION(N_NATIONKEY);

    修改第73行:

    ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references TPCD.SUPPLIER;

    =>

    ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references TPCD.SUPPLIER(S_SUPPKEY);

    修改第78行:

    ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references TPCD.PART;

    =>

    ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references TPCD.PART(P_PARTKEY);

    修改第84行:

    ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references TPCD.CUSTOMER;

    =>

    ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references TPCD.CUSTOMER(C_CUSTKEY);

    修改第90行:

    ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY)  references TPCD.ORDERS;

    =>

    ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY)  references TPCD.ORDERS(O_ORDERKEY);

    修改第96行:

    TPCD.PARTSUPP;

    =>

    TPCD.PARTSUPP(PS_PARTKEY,PS_SUPPKEY);

    另外,由于tpch生成的表名是大写的,需要修改下表名成小写的,因此再增加几行:

    use tpch;

    alter table CUSTOMER rename to customer ;

    alter table LINEITEM rename to lineitem ;

    alter table NATION rename to nation   ;

    alter table ORDERS rename to orders   ;

    alter table PART rename to part     ;

    alter table PARTSUPP rename to partsupp ;

    alter table REGION rename to region   ;

    alter table SUPPLIER rename to supplier ;

    3、导入数据

    测试数据生成了,测试库表也初始化完了,接下来就可以开始导入数据了。

    需要注意下,如果开启了binlog,在导入前最好先关闭binlog,否则会提示超出max_binlog_cache_size的错误提示,如果不能关闭binlog,则需要把导入文件切分成多个小文件再导入。

    myqsl -e "LOAD DATA INFILE 'path/dbgen/customer.tbl' INTO TABLE CUSTOMER FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/orders.tbl'   INTO TABLE ORDERS   FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/lineitem.tbl' INTO TABLE LINEITEM FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/nation.tbl'   INTO TABLE NATION   FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/partsupp.tbl' INTO TABLE PARTSUPP FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/part.tbl'     INTO TABLE PART     FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/region.tbl'   INTO TABLE REGION   FIELDS TERMINATED BY '|';"

    myqsl -e "LOAD DATA INFILE 'path/dbgen/supplier.tbl' INTO TABLE SUPPLIER FIELDS TERMINATED BY '|';"

    4、执行tpch测试接下来就可以进行tpch测试了,逐个执行23个查询SQL脚本即可,每次执行前都要重启下MySQL实例,确保每次的内存缓冲区都是干净的。

    简单循环测试脚本如下:

    #!/bin/sh

    ##

    ## 执行tpch OLAP测试

    ##

    ## writed by yejr(http://imysql.com), 2012/12/14

    ##

    PATH=$PATH:/usr/local/bin

    export PATH

    . ~/.bash_profile > /dev/null 2>&1

    exec 3>&1 4>&2 1>> tpch-benchmark-olap-`date +'%Y%m%d%H%M%S'`.log 2>&1

    I=1

    II=3

    while [ $I -le $II ]

    do

    N=1

    T=23

    while [ $N -lt $T ]

    do

    if [ $N -lt 10 ] ; then

    NN='0'$N

    else

    NN=$N

    fi

    echo "query $NN starting"

    /etc/init.d/mysql restart

    time mysql -f tpch < ./queries/tpch_${NN}.sql

    echo "query $NN ended!"

    N=`expr $N + 1`

    done

    I=`expr $I + 1`

    Done

    附件:tpch初始化、自动化测试脚本压缩包与word手册。

    本文标题: MySQL-tpch 测试工具简要手册

    本文地址: http://www.cppcns.com/shujuku/mysql/151562.html

    展开全文
  • MySQL_tpch测试工具简要手册
  • pgdb=# \i /user1/postgresql/data/dss.riERROR:syntax error at or near "CONNECT" at character 1STATEMENT:CONNECT TO TPCD;psql:/user1/postgresql/data/dss.ri:4: ERROR:syntax error at or near "CONNEC...
  • tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库...
  • tpch测试软件

    2014-01-21 11:42:08
    参照TPC-H基准开发的性能测试程序,纯JAVA开发。支持主流数据库
  • TPCH测试工具

    2019-06-09 22:03:00
    TPC现有的测试标准为:TPC-E、TPC-C、TPC-H、TPC-App。根据这4个测试基准,目前TPC主要包括的4个技术小组委员会:TPC-E 技术小组委员会、TPC-C 技术小组委员会、TPC-H技术小组委员会、TPC-App技术小组委员会。前期...
  • MySQLtpch测试工具简要手册tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据...
  • 3)运行tpch目录下的dss.ddl文件,创建tpch数据库中的表 mysql> \. tpch/gen/dss.ddl 4)然后在tpch文件目录下,把makefile复制并改名成makefile,接着修改makefile文件 shell> cp makefile.suite makefile shell> vim...
  • tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库、...
  • Greenplum,HAWQ interval parser带来的问题 - TPCH 测试注意啦 作者 digoal 日期 2016-10-11 标签 Greenplum , PostgreSQL , interval , parser 背景 interval是用来表达时间间隔的数据类型,比如1年,或者1...
  • TPCH是由TPC(Transaction Processing Performance Council)事务处理性能委员会公布的一套针对数据库决策支持能力的测试基准,通过模拟数据库中与业务相关的复杂查询考察数据库的综合处理能力,获取数据库操作的响应...
  • tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库...
  • 一、 目标TPCH是由TPC(Transaction Processing Performance Council)事务处理性能委员会公布的一套针对数据库决策支持能力的测试基准,通过模拟数据库中与业务相关的复杂查询考察数据库的综合处理能力,获取数据库...
  • tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库、...
  • 1、引言如今,多样的交易模式以及大众消费观念的改变使得数据库应用领域不断扩大,现代的大型分布式应用系统的数据膨胀也对数据库的...通过TPC-H基准测试,可获得数据库单位时间内的性能处理能力,为评估数据库系...
  • 一、 查询要求 Q21语句查询获得不能及时交货的供货商。 Q21语句的特点是:带有分组、排序、聚集、EXISTS子查询、NOT EXISTS子查询操作并存的四表连接操作。查询语句没有从语法上限制返回多少条元组,但是TPC-H...
  • 一、 查询要求 Q22语句查询获得消费者可能购买的地理分布。本查询计算在指定的国家,比平均水平更持肯定态度但还没下七年订单的消费者数量。能反应出普通消费者的态度,即购买意向。 Q22语句的特点是:带有分组...
  • 一、 查询要求 Q20语句查询确定在某一年内,找出指定国家的能对某一零件商品提供更有竞争力价格的供货商。所谓更有竞争力的供货商,是指那些零件有过剩的供货商,超过供货商在某一年中货运给定国的某一零件的50%...
  • 一、 查询要求 Q19语句查询得到对一些空运或人工运输零件三个不同种类的所有订单的总折扣收入。零件的选择考虑特定品牌、包装和尺寸范围。 Q19语句的特点是:带有聚集、IN子查询操作并存的三表连接操作。...
  • 一、 查询要求 Q17语句查询获得比平均供货量的百分之二十还低的小批量订单。对于指定品牌和指定包装类型的零件,决定在一个七年数据库的所有订单中这些订单零件的平均项目数量(过去的和未决的)。...

空空如也

空空如也

1 2 3 4 5 ... 9
收藏数 170
精华内容 68
关键字:

tpch测试