精华内容
下载资源
问答
  • tpch测试软件

    2014-01-21 11:42:08
    参照TPC-H基准开发的性能测试程序,纯JAVA开发。支持主流数据库
  • tpch测试步骤

    2021-08-13 11:24:50
    Tpch测试步骤 1:上传dbgen工具并解压 装载1g的数据 [root@dm1 tpch_dbgen]# ./dbgen -s 1 2:检查并修改控制文件 修改控制文件 3:建表 CREATE HUGE TABLE CUSTOMER ( C_CUSTKEY INT NOT NULL, C_NAME VARCHAR(25) ...

    Tpch测试步骤

    1:上传dbgen工具并解压
    装载1g的数据
    [root@dm1 tpch_dbgen]# ./dbgen -s 1

    2:检查并修改控制文件

    修改控制文件

    3:建表
    CREATE HUGE TABLE CUSTOMER
    (
    C_CUSTKEY INT NOT NULL,
    C_NAME VARCHAR(25) NOT NULL,
    C_ADDRESS VARCHAR(40) NOT NULL,
    C_NATIONKEY INT NOT NULL,
    C_PHONE CHAR(15) NOT NULL,
    C_ACCTBAL FLOAT NOT NULL,
    C_MKTSEGMENT CHAR(10) NOT NULL,
    C_COMMENT VARCHAR(117) NOT NULL,
    PRIMARY KEY (C_CUSTKEY)
    )
    DISTRIBUTED BY (C_CUSTKEY);

    CREATE HUGE TABLE LINEITEM
    (
    L_ORDERKEY BIGINT NOT NULL,
    L_PARTKEY INT NOT NULL,
    L_SUPPKEY INT NOT NULL,
    L_LINENUMBER INT NOT NULL,
    L_QUANTITY FLOAT NOT NULL,
    L_EXTENDEDPRICE FLOAT NOT NULL,
    L_DISCOUNT FLOAT NOT NULL,
    L_TAX FLOAT NOT NULL,
    L_RETURNFLAG CHAR(1) NOT NULL,
    L_LINESTATUS CHAR(1) NOT NULL,
    L_SHIPDATE DATE NOT NULL,
    L_COMMITDATE DATE NOT NULL,
    L_RECEIPTDATE DATE NOT NULL,
    L_SHIPINSTRUCT CHAR(25) NOT NULL,
    L_SHIPMODE CHAR(10) NOT NULL,
    L_COMMENT VARCHAR(44) NOT NULL,
    PRIMARY KEY(L_ORDERKEY, L_LINENUMBER)
    )
    DISTRIBUTED BY(L_ORDERKEY);

    CREATE HUGE TABLE NATION
    (
    N_NATIONKEY INT NOT NULL,
    N_NAME CHAR(25) NOT NULL,
    N_REGIONKEY INT NOT NULL,
    N_COMMENT VARCHAR(152) NOT NULL,
    PRIMARY KEY (N_NATIONKEY)
    )DISTRIBUTED BY (N_NATIONKEY);

    CREATE HUGE TABLE ORDERS
    (
    O_ORDERKEY BIGINT NOT NULL,
    O_CUSTKEY INT NOT NULL,
    O_ORDERSTATUS CHAR(1) NOT NULL,
    O_TOTALPRICE FLOAT NOT NULL,
    O_ORDERDATE DATE NOT NULL,
    O_ORDERPRIORITY CHAR(15) NOT NULL,
    O_CLERK CHAR(15) NOT NULL,
    O_SHIPPRIORITY INTEGER NOT NULL,
    O_COMMENT VARCHAR(79) NOT NULL,
    PRIMARY KEY(O_ORDERKEY)
    )
    DISTRIBUTED BY (O_ORDERKEY);

    CREATE HUGE TABLE PART
    (
    P_PARTKEY INT NOT NULL,
    P_NAME VARCHAR(55) NOT NULL,
    P_MFGR CHAR(25) NOT NULL,
    P_BRAND CHAR(10) NOT NULL,
    P_TYPE VARCHAR(25) NOT NULL,
    P_SIZE INT NOT NULL,
    P_CONTAINER CHAR(10) NOT NULL,
    P_RETAILPRICE FLOAT NOT NULL,
    P_COMMENT VARCHAR(23) NOT NULL,
    PRIMARY KEY (P_PARTKEY)
    )
    DISTRIBUTED BY (P_PARTKEY);

    CREATE HUGE TABLE PARTSUPP
    (
    PS_PARTKEY INT NOT NULL,
    PS_SUPPKEY INT NOT NULL,
    PS_AVAILQTY INT NOT NULL,
    PS_SUPPLYCOST FLOAT NOT NULL,
    PS_COMMENT VARCHAR(199) NOT NULL,
    PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY)
    )
    DISTRIBUTED BY (PS_PARTKEY);

    CREATE HUGE TABLE REGION
    (
    R_REGIONKEY INT NOT NULL,
    R_NAME CHAR(25) NOT NULL,
    R_COMMENT VARCHAR(152) NOT NULL,
    PRIMARY KEY (R_REGIONKEY)
    )
    DISTRIBUTED BY (R_REGIONKEY);

    CREATE HUGE TABLE SUPPLIER
    (
    S_SUPPKEY INT NOT NULL,
    S_NAME CHAR(25) NOT NULL,
    S_ADDRESS VARCHAR(40) NOT NULL,
    S_NATIONKEY INT NOT NULL,
    S_PHONE CHAR(15) NOT NULL,
    S_ACCTBAL FLOAT NOT NULL,
    S_COMMENT VARCHAR(101) NOT NULL,
    PRIMARY KEY (S_SUPPKEY)
    )
    DISTRIBUTED BY (S_SUPPKEY);

    4:加载数据

    [root@dm1 bin]# ./dmfldr userid=SYSDBA/SYSDBA:5236 control=’/dm8/tpch/linux/customer.ctrl’
    [root@dm1 bin]# ./dmfldr userid=SYSDBA/SYSDBA:5236 control=’/dm8/tpch/linux/lineitem.ctrl’
    [root@dm1 bin]# ./dmfldr userid=SYSDBA/SYSDBA:5236 control=’/dm8/tpch/linux/nation.ctrl’
    [root@dm1 bin]# ./dmfldr userid=SYSDBA/SYSDBA:5236 control=’/dm8/tpch/linux/orders.ctrl’
    [root@dm1 bin]# ./dmfldr userid=SYSDBA/SYSDBA:5236 control=’/dm8/tpch/linux/part.ctrl’
    [root@dm1 bin]# ./dmfldr userid=SYSDBA/SYSDBA:5236 control=’/dm8/tpch/linux/partsupp.ctrl’
    [root@dm1 bin]# ./dmfldr userid=SYSDBA/SYSDBA:5236 control=’/dm8/tpch/linux/region.ctrl’
    [root@dm1 bin]# ./dmfldr userid=SYSDBA/SYSDBA:5236 control=’/dm8/tpch/linux/supplier.ctrl’

    5:更新统计信息
    –删除表上所有列的统计信息
    CALL SP_TAB_COL_STAT_DEINIT (‘SYSDBA’, ‘CUSTOMER’);
    CALL SP_TAB_COL_STAT_DEINIT (‘SYSDBA’, ‘LINEITEM’);
    CALL SP_TAB_COL_STAT_DEINIT (‘SYSDBA’, ‘NATION’);
    CALL SP_TAB_COL_STAT_DEINIT (‘SYSDBA’, ‘ORDERS’);
    CALL SP_TAB_COL_STAT_DEINIT (‘SYSDBA’, ‘PART’);
    CALL SP_TAB_COL_STAT_DEINIT (‘SYSDBA’, ‘PARTSUPP’);
    CALL SP_TAB_COL_STAT_DEINIT (‘SYSDBA’, ‘REGION’);
    CALL SP_TAB_COL_STAT_DEINIT (‘SYSDBA’, ‘SUPPLIER’);
    –删除表的统计信息
    CALL SP_TAB_STAT_DEINIT (‘SYSDBA’, ‘CUSTOMER’);
    CALL SP_TAB_STAT_DEINIT (‘SYSDBA’, ‘LINEITEM’);
    CALL SP_TAB_STAT_DEINIT (‘SYSDBA’, ‘NATION’);
    CALL SP_TAB_STAT_DEINIT (‘SYSDBA’, ‘ORDERS’);
    CALL SP_TAB_STAT_DEINIT (‘SYSDBA’, ‘PART’);
    CALL SP_TAB_STAT_DEINIT (‘SYSDBA’, ‘PARTSUPP’);
    CALL SP_TAB_STAT_DEINIT (‘SYSDBA’, ‘REGION’);
    CALL SP_TAB_STAT_DEINIT (‘SYSDBA’, ‘SUPPLIER’);
    SP_TAB_STAT_INIT(‘SYSDBA’,‘REGION’);
    SP_TAB_STAT_INIT(‘SYSDBA’,‘NATION’);
    SP_TAB_STAT_INIT(‘SYSDBA’,‘PART’);
    SP_TAB_STAT_INIT(‘SYSDBA’,‘PARTSUPP’);
    SP_TAB_STAT_INIT(‘SYSDBA’,‘SUPPLIER’);
    SP_TAB_STAT_INIT(‘SYSDBA’,‘CUSTOMER’);
    SP_TAB_STAT_INIT(‘SYSDBA’,‘ORDERS’);
    SP_TAB_STAT_INIT(‘SYSDBA’,‘LINEITEM’);
    –更新统计信息
    STAT 100 ON REGION(R_NAME);
    STAT 100 ON REGION(R_REGIONKEY);
    STAT 100 ON NATION(N_NAME);
    STAT 100 ON NATION(N_NATIONKEY) ;
    STAT 100 ON NATION(N_REGIONKEY) ;
    STAT 100 ON SUPPLIER(S_SUPPKEY) ;
    STAT 100 ON SUPPLIER(S_NATIONKEY);
    STAT 100 ON SUPPLIER(S_COMMENT);

    STAT 100 ON PART(P_SIZE);
    STAT 100 ON PART(P_BRAND);
    STAT 100 ON PART(P_TYPE);
    STAT 100 ON PART(P_NAME);
    STAT 100 ON PART(P_PARTKEY);
    STAT 100 ON PART(P_CONTAINER);

    STAT 100 ON PARTSUPP(PS_SUPPKEY);
    STAT 100 ON PARTSUPP(PS_PARTKEY);

    STAT 100 ON ORDERS(O_ORDERKEY);
    STAT 100 ON ORDERS(O_ORDERDATE);
    STAT 100 ON ORDERS(O_ORDERSTATUS);
    STAT 100 ON ORDERS(O_ORDERPRIORITY);
    STAT 100 ON ORDERS(O_CUSTKEY);
    STAT 100 ON ORDERS(O_COMMENT);

    STAT 100 ON LINEITEM(L_SUPPKEY);
    STAT 100 ON LINEITEM(L_PARTKEY);
    STAT 100 ON LINEITEM(L_ORDERKEY);
    STAT 100 ON LINEITEM(L_SHIPDATE);
    STAT 100 ON LINEITEM(L_SHIPMODE);
    STAT 100 ON LINEITEM(L_COMMITDATE);
    STAT 100 ON LINEITEM(L_RECEIPTDATE);
    STAT 100 ON LINEITEM(L_RETURNFLAG);
    STAT 100 ON LINEITEM(L_LINESTATUS);
    STAT 100 ON LINEITEM(L_QUANTITY);
    STAT 100 ON LINEITEM(L_SHIPINSTRUCT);

    STAT 100 ON CUSTOMER(C_CUSTKEY);
    STAT 100 ON CUSTOMER(C_MKTSEGMENT);
    STAT 100 ON CUSTOMER(C_NATIONKEY);
    STAT 100 ON CUSTOMER(C_ACCTBAL);

    6:测试22条sql执行时间

    22条语句
    –Q1(LINEITEM)
    select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
    sum(l_extendedprice*(1-l_discount)(1+l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(
    ) as count_order
    from lineitem
    where l_shipdate <= date’1998-12-01’ - interval ‘90’ day
    group by l_returnflag, l_linestatus
    order by l_returnflag, l_linestatus;
    –Q2
    select TOP 100
    s_acctbal, s_name,
    n_name, p_partkey, p_mfgr,
    s_address, s_phone, s_comment
    from part, supplier, partsupp, nation, region
    where p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 15
    and p_type like ‘%BRASS’
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = ‘EUROPE’
    and ps_supplycost =
    ( select min(ps_supplycost)
    from partsupp, supplier, nation, region
    where p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = ‘EUROPE’ )
    order by s_acctbal desc, n_name, s_name, p_partkey;
    –Q3
    select TOP 10
    l_orderkey,
    sum(l_extendedprice*(1-l_discount)) as revenue,
    o_orderdate,
    o_shippriority
    from customer, orders, lineitem
    where c_mktsegment = ‘BUILDING’
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date’1995-03-15’
    and l_shipdate > date’1995-03-15’
    group by l_orderkey, o_orderdate, o_shippriority
    order by revenue desc, o_orderdate;
    –Q4
    select
    o_orderpriority,
    count() as order_count
    from orders
    where o_orderdate >= date’1993-07-01’
    and o_orderdate < date ‘1993-07-01’ + interval ‘3’ month
    and exists
    (select *
    from lineitem
    where l_orderkey = o_orderkey
    and l_commitdate < l_receiptdate )
    group by o_orderpriority
    order by o_orderpriority;
    –Q5
    select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
    from customer, orders, lineitem, supplier, nation, region
    where c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = ‘ASIA’
    and o_orderdate >= date’1994-01-01’
    and o_orderdate < date ‘1994-01-01’ + interval ‘1’ year
    group by n_name
    order by revenue desc;
    –Q6
    select
    sum(l_extendedprice
    l_discount) as revenue
    from lineitem
    where l_shipdate >= ‘1994-01-01’
    and l_shipdate < dateadd(year,1, ‘1994-01-01’)
    and l_discount between 0.06 - 0.01
    and 0.06 + 0.01
    and l_quantity < 24;
    –Q7
    select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
    from ( select n1.n_name as supp_nation,
    n2.n_name as cust_nation,
    extract(year from l_shipdate) as l_year,
    l_extendedprice * (1 - l_discount) as volume
    from supplier, lineitem, orders, customer, nation n1, nation n2
    where s_suppkey = l_suppkey
    and o_orderkey = l_orderkey
    and c_custkey = o_custkey
    and s_nationkey = n1.n_nationkey
    and c_nationkey = n2.n_nationkey
    and ( (n1.n_name = ‘FRANCE’ and n2.n_name = ‘GERMANY’) or (n1.n_name = ‘GERMANY’ and n2.n_name = ‘FRANCE’) )
    and l_shipdate between date’1995-01-01’ and date’1996-12-31’ ) as shipping
    group by supp_nation, cust_nation, l_year
    order by supp_nation, cust_nation, l_year;
    –Q8
    select
    o_year,
    sum(case when nation = ‘BRAZIL’ then volume else 0 end) / sum(volume) as mkt_share
    from ( select extract(year from o_orderdate) as o_year,
    l_extendedprice * (1-l_discount) as volume,
    n2.n_name as nation
    from part, supplier, lineitem, orders, customer, nation n1, nation n2, region
    where p_partkey = l_partkey
    and s_suppkey = l_suppkey
    and l_orderkey = o_orderkey
    and o_custkey = c_custkey
    and c_nationkey = n1.n_nationkey
    and n1.n_regionkey = r_regionkey
    and r_name = ‘AMERICA’
    and s_nationkey = n2.n_nationkey
    and o_orderdate between date ‘1995-01-01’
    and date ‘1996-12-31’
    and p_type = ‘ECONOMY ANODIZED STEEL’ ) as all_nations
    group by o_year
    order by o_year;
    –Q9
    select
    nation,
    o_year,
    sum(amount) as sum_profit
    from ( select n_name as nation,
    extract(year from o_orderdate) as o_year,
    l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
    from part, supplier, lineitem, partsupp, orders, nation
    where s_suppkey = l_suppkey
    and ps_suppkey = l_suppkey
    and ps_partkey = l_partkey
    and p_partkey = l_partkey
    and o_orderkey = l_orderkey
    and s_nationkey = n_nationkey
    and p_name like ‘%green%’ ) as profit
    group by nation, o_year
    order by nation, o_year desc;
    –Q10
    select top 20
    c_custkey,
    c_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    c_acctbal,
    n_name,
    c_address, c_phone,
    c_comment
    from customer, orders, lineitem, nation
    where c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date’1993-10-01’
    and o_orderdate < date’1993-10-01’ + interval ‘3’ month
    and l_returnflag = ‘R’
    and c_nationkey = n_nationkey
    group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment
    order by revenue desc;
    –Q11
    select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as valuess
    from partsupp, supplier, nation
    where ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = ‘GERMANY’
    group by ps_partkey
    having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0001
    from partsupp, supplier, nation
    where ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = ‘GERMANY’ )
    order by valuess desc;
    –Q12
    select
    l_shipmode,
    sum(case when o_orderpriority =‘1-URGENT’ or o_orderpriority =‘2-HIGH’ then 1 else 0 end) as high_line_count,
    sum(case when o_orderpriority <> ‘1-URGENT’ and o_orderpriority <> ‘2-HIGH’ then 1 else 0 end) as low_line_count
    from orders, lineitem
    where o_orderkey = l_orderkey
    and l_shipmode in (‘MAIL’, ‘SHIP’)
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date ‘1994-01-01’
    and l_receiptdate < date ‘1994-01-01’ + interval ‘1’ year
    group by l_shipmode
    order by l_shipmode;
    –Q13
    select
    c_count,
    count() as custdist
    from ( select c_custkey,
    count(o_orderkey)
    from customer left outer join orders on c_custkey = o_custkey
    and o_comment not like ‘%special%requests%’
    group by c_custkey )as c_orders (c_custkey, c_count)
    group by c_count
    order by custdist desc, c_count desc;
    –Q14
    select
    100.00 * sum(case when p_type like ‘PROMO%’
    then l_extendedprice
    (1-l_discount)
    else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
    from lineitem, part
    where l_partkey = p_partkey
    and l_shipdate >= date ‘1995-09-01’
    and l_shipdate < date’1995-09-01’ + interval ‘1’ month;
    –Q15
    create view revenue (supplier_no, total_revenue) as
    select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from lineitem
    where l_shipdate >= date ‘1996-01-01’ and l_shipdate < date ‘1996-01-01’ + interval ‘3’ month group by l_suppkey;
    select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
    from supplier, revenue
    where s_suppkey = supplier_no
    and total_revenue =
    ( select max(total_revenue) from revenue ) order by s_suppkey;
    drop view revenue;
    –Q16
    select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
    from partsupp, part
    where p_partkey = ps_partkey
    and p_brand <> ‘Brand#45’
    and p_type not like ‘MEDIUM POLISHED%’
    and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
    and ps_suppkey not in ( select s_suppkey
    from supplier
    where s_comment like ‘%Customer%Complaints%’ )
    group by p_brand, p_type, p_size
    order by supplier_cnt desc, p_brand, p_type, p_size;
    –Q17
    select
    sum(l_extendedprice) / 7.0 as avg_yearly
    from lineitem, part
    where p_partkey = l_partkey
    and p_brand = ‘Brand#23’
    and p_container = ‘MED BOX’
    and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
    –Q18
    select top 100
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
    from customer, orders, lineitem
    where o_orderkey in ( select l_orderkey
    from lineitem
    group by l_orderkey
    having sum(l_quantity) > 300 )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
    group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
    order by o_totalprice desc, o_orderdate;
    –Q19
    select
    sum(l_extendedprice * (1 - l_discount) ) as revenue
    from lineitem, part
    where ( p_partkey = l_partkey
    and p_brand = ‘Brand#12’
    and p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)
    and l_quantity >= 1
    and l_quantity <= 1 + 10
    and p_size between 1 and 5
    and l_shipmode in (‘AIR’, ‘AIR REG’)
    and l_shipinstruct = ‘DELIVER IN PERSON’ )
    or ( p_partkey = l_partkey and p_brand = ‘Brand#23’
    and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’)
    and l_quantity >= 10
    and l_quantity <= 10 + 10
    and p_size between 1 and 10
    and l_shipmode in (‘AIR’, ‘AIR REG’)
    and l_shipinstruct = ‘DELIVER IN PERSON’ )
    or ( p_partkey = l_partkey
    and p_brand = ‘Brand#34’
    and p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)
    and l_quantity >= 20
    and l_quantity <= 20 + 10
    and p_size between 1 and 15
    and l_shipmode in (‘AIR’, ‘AIR REG’)
    and l_shipinstruct = ‘DELIVER IN PERSON’ );
    –Q20
    select
    s_name,
    s_address
    from supplier, nation
    where s_suppkey in ( select ps_suppkey from partsupp
    where ps_partkey in ( select p_partkey from part where p_name like ‘forest%’ )
    and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey
    and l_suppkey = ps_suppkey
    and l_shipdate >= date ‘1994-01-01’
    and l_shipdate < date ‘1994-01-01’ + interval ‘1’ year )
    )
    and s_nationkey = n_nationkey
    and n_name = ‘CANADA’
    order by s_name;
    –Q21
    select top 100
    s_name,
    count() as numwait
    from supplier, lineitem l1, orders, nation
    where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey
    and o_orderstatus = ‘F’
    and l1.l_receiptdate > l1.l_commitdate
    and exists ( select * from lineitem l2
    where l2.l_orderkey = l1.l_orderkey
    and l2.l_suppkey <> l1.l_suppkey )
    and not exists ( select * from lineitem l3
    where l3.l_orderkey = l1.l_orderkey
    and l3.l_suppkey <> l1.l_suppkey
    and l3.l_receiptdate > l3.l_commitdate )
    and s_nationkey = n_nationkey
    and n_name = ‘SAUDI ARABIA’
    group by s_name
    order by numwait desc, s_name;
    –Q22
    select
    cntrycode,
    count(
    ) as numcust,
    sum(c_acctbal) as totacctbal
    from ( select substring(c_phone from 1 for 2) as cntrycode, c_acctbal
    from customer
    where substring(c_phone from 1 for 2) in (‘13’,‘31’,‘23’,‘29’,‘30’,‘18’,‘17’)
    and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00
    and substring (c_phone from 1 for 2) in (‘13’,‘31’,‘23’,‘29’,‘30’,‘18’,‘17’)
    )
    and not exists ( select * from orders where o_custkey = c_custkey )
    ) as custsale
    group by cntrycode
    order by cntrycode;

    达梦云适配技术社区
    https://eco.dameng.com/

    展开全文
  • Postgresql MySQL 性能测试TPCH-环境部署,非常不错的文档,部署生产环境肯定需要全面测试
  • tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库...
  • tpch之oracle性能测试

    2018-01-01 13:41:29
    使用tpch进行oracle的装载测试、Power测试,并给出QGEN生成的22个查询语句及其详细解释
  • TPCH测试脚本

    千次阅读 2017-05-11 12:47:35
    tpch测试脚本
    #!/usr/bin/env python
    # encoding: utf-8
    
    """
    tpch audit run
    
    Options:
      -d             debug mode
      -h, --help     show this help
      -q, --query_program=  program for query streams, default to use mysql
      -r, --root=    root directoy
      -s, --scale=   scale factor (default 1, 1GB)
      -p, --power    power test only
      -t, --throughput  throughput test only
      -u, --update   do update or rf1, rf2
      --test_only     only test
      --db=          database name
    
    """
    
    __author__ = "David Zhang"
    __modifier__ = "tianbaochao"
    __version__ = "$Revision: 1.1 $"
    __date__ = "$Date: 2007/08/08 $"
    __modify_date__ = "$Date: 2016/08/24 $"
    
    import sys
    import getopt
    import re
    import random
    import copy
    from threading import Thread
    import difflib
    import os
    from sets import Set
    import pdb
    import time
    import signal
    import shutil
    import inspect
    import MySQLdb
    import math
    
    # global variables
    #数据库名 must
    tpch_db_name = "tpch"
    
    _debug = 1
    #数据库名 must
    mysock="/var/lib/mysql/mysql.sock"
    mysql_test_dir="/usr/local/mysql/mysql-test"
    mysql_cmd = " cat %s | mysql --sock=" + mysock + " --database=%s --force  -u root -p123456 -h127.0.0.1 -v -v -v" % tpch_db_name
    
    
    # global variables
    script_log_file=""
    run_id=7
    #数据级别,单位G,must
    scale_factor=1
    #tpch根目录
    audit_dir="/data/test/tpch_2_17_0"
    dbgen_dir="/data/test/tpch_2_17_0/dbgen"
    qgen="/data/test/tpch_2_17_0/dbgen/qgen"
    #important!!!! 结果输出目录,每次测试一定要换
    output_dir="/data/test/tpch_2_17_0/output"
    rf_tmp_file_name_template = "/data/test/tpch_2_17_0/dbgen/rf%s_%s.sql" 
    
    rand_seed=8768640832410
    num_streams_scale ={1:2,10:3,30:4,100:5,300:6,1000:7,3000:8,10000:9}
    
    do_power=False
    do_throughput=False
    do_update=False
    do_loader=False
    query_file_template="%s/query_%s_s%s.sql"
    use_mysql_client = False
    use_mysql_client = True
    tpch_conn = None
    test_only=False
    rf_throughput_concurrent=False    # if true start RF and throughput concurrently, or start throughput first then RF, concurrent may be better as query will not use CPU a lot
                                      # this will cause deadlock
    num_seqs=1                        # how many iterations
    num_rf_streames=1                 
    update_only = False              # only do update
    no_cleanup = False               # do not cleanup 
    leave_server_up = False          # leave the server up after the run
    gen_udf_call = False              # true if to use UDF instead of profiling
    
    qtime=[]   # query time (seq, stream_id, qnum, start,end, time)
    rftime=[]  # refresh time (seq, stream_id, rf_id, start, end, time)
    streams_time=[]  # streams time  (seq, stream_id, start, end, duration)
    tpch_power_numbers=[]
    tpch_throughput_numbers=[]
    all_queries={}
    
    # for reporting
    harp_run_time_log=""
    master_err_file_hanle=None
    master_err_file=None
    
    #====Utilties============
    
    class TimedOutExc(Exception):
        def __init__(self, value = "Timed Out"):
            self.value = value
        def __str__(self):
            return repr(self.value)
    
    
    
    def sys_call(cmd):
    #    dbg_print(os.getcwd())
    #    print (cmd)
        dbg_print("%s:%s" % (os.getcwd(),cmd))
        os.system(cmd)
    
    def sys_pipe_call(cmd):
        dbg_print("%s:%s" % (os.getcwd(),cmd))
        return os.popen(cmd).read()               
    #    return os.popen("echo " + cmd).read() 
    #    return cmd
    
    def sys_pipe_call_1(cmd):
        ''' also return the errors '''
    #    time.sleep(0.1)
    #    return cmd
        dbg_print("%s:%s" % (os.getcwd(),cmd))
        return os.popen4(cmd)[1].read()
        #return cmd
    
    def sys_call_env(cmd):
        cmds=cmd.split()
        dbg_print("cmds= %s " % cmds)
        os.spawnv( os.P_WAIT, cmds[0], cmds[1:])
    
    def usage():
        caller_name = sys._getframe(1).f_code.co_name
        #print "caller_name = %s " % caller_name
        if (caller_name=='main'):
            print __doc__
        else:
            print globals()[caller_name].__doc__
        sys.exit(0)                  
    
    def dbg_print(in_str):
        if _debug:
            print ("== " + sys._getframe(1).f_code.co_name + " == " + str(in_str))
    
    def whoami():
        return sys._getframe(1).f_code.co_name
    
    # tpch_
    def tpch_error(s):
            print ("TPCH Error:" + s)
            sys.exit(1)
    
    def get_time():
        return time.time()
    
    def write_log(s,log_file_name=None,timestamp=True,pstdout=False):
        if not log_file_name: pstdout = True  # print the master
        indent_start=4
        if not log_file_name: log_file_name = script_log_file
        #dbg_print("log_file_name=%s" % log_file_name)
        #dbg_print("script_log_file=%s" % script_log_file)
        #indent = "".join(["  " for i in range(len(inspect.stack()))])
        indent = "".rjust((len(inspect.stack())-indent_start)*3)
        ts = timestamp and "%s :" % time.strftime('%x %X') or ""
        line = ts + indent + s + "\n"
        open(log_file_name,"a+").write(line)
        if pstdout: sys.stdout.write(line)
        if not log_file_name: dbg_print(line)
    
    
    
    import distutils.dir_util
    
    
    def setup_tpch():
        global audit_dir, dbgen_dir, qgen, output_dir, script_log_file, harp_run_time_log
    
        if not audit_dir: raise AssertionError, "AUDIT_DIR is not defined and not in a view\n"
    
        if not dbgen_dir: dbgen_dir="%s/dbgen" % audit_dir
    
        if not output_dir: output_dir = "%s/results/r%s" % (audit_dir, run_id)
    
        try:
            shutil.rmtree(output_dir)
        except: pass
        os.mkdir(output_dir)
    
        qgen = "%s/qgen" % dbgen_dir
        # set this so qgen can find the query template
        os.environ['DSS_QUERY'] = '%s/queries' % (dbgen_dir)
        os.environ['DSS_PATH'] = output_dir    # this is where the update file is stored
        os.environ['DSS_CONFIG'] = dbgen_dir
    
        script_log_file = "%s/tpch.log" % (output_dir)
        harp_run_time_log = "%s/harp_run_time.log" % (output_dir)
    
        # setup_mysql()   # setup mysql related
    
    
    def compute_result(power_run,seq,num_streams,duration,result):
        if power_run:
            run_time = [x[-1] for x in qtime if x[0] == seq and x[1] == 0 ]
            run_time.extend([x[-1] for x in rftime if x[0] == seq and x[1] == 0 ])
        else:
            run_time = [x[-1] for x in qtime if x[0] == seq and x[1] != 0 ]
            run_time.extend([x[-1] for x in rftime if x[0] == seq and x[1] != 0 ])
    
        num_queries_and_rfs = len(run_time)
        write_log("%s Seq %s num queries and rfs = %s" % (power_run and "Power" or "Throughput", seq,num_queries_and_rfs), pstdout=True)
        geo_mean = math.exp(sum([math.log(t) for t in run_time])*1.0/float(num_queries_and_rfs))
        arith_mean = sum([t for t in run_time])/float(num_queries_and_rfs)
        if power_run: result_number = 3600.0* geo_mean *scale_factor
        else: result_number = (num_streams*22.0*3600)/duration*scale_factor
        result.append((result_number, geo_mean, arith_mean))
    
    def print_results():
        ''' print all the results '''
        qtime.sort(lambda x,y: int(float(x[0]*10000+x[1]*100+x[2])-float(y[0]*10000+y[1]*100+y[2])))
    
        write_log("TPCH Detailed Results")
        for i in range(num_seqs):   # seq number
            tmp_qtime = [list(x) for x in qtime if x[0]==i+1]
            for q in tmp_qtime:
                write_log("Seq %s, Streams %s, Query %s: start_time= %s, end_time = %s, duration = %s " % (tuple(q)))
            tmp_qtime=([list(x) for x in rftime if x[0] ==i+1])
            for q in tmp_qtime:
                write_log("Seq %s, Streams %s, RF %s: start_time = %s, end_time=%s, duration = %s " % (tuple(q)))
    
        write_log("TPCH Streams Results")
        streams_time.sort(lambda x,y: int(float(x[0]*100+x[1])-float(y[0]*100+y[1])))
        for x in streams_time:
    #        write_log("Seq %s, Streams %s: start_time = %s , end_time = %s, duration = %s " % x)
            (seq_id, stream_id, start_time, end_time, duration) = x
            stream_rand_seed = rand_seed and long(rand_seed) + stream_id or 'default'
            if stream_id<100: tmp_x = [seq_id, "Query Streams %s" % stream_id, "rand_seed = %s" % stream_rand_seed]  
            else: tmp_x = [seq_id, 'RF throughput streams','']   # RF is 1001 for sorting purpose
            tmp_x.append(conv_to_date(start_time))
            tmp_x.append(conv_to_date(end_time))
            tmp_x.append(duration)
            write_log("Seq %s, %s: %s, start_time = %s , end_time = %s, duration = %s " % tuple(tmp_x))
    
        # aggregated results
        if do_power:
            write_log("TPCH Power Tests Results", pstdout=True)
            for i in range(num_seqs):
                write_log("  Power  Seq %s: (Power Metric, GeoMean, ArithMean) = %s  " % (i+1, tpch_power_numbers[i]), pstdout=True)
        if do_throughput:
            write_log("TPCH Throughput Tests Results",pstdout=True)
            for i in range(num_seqs):
                write_log("  ThroughPut Seq %s: (Throughput Metric, GeoMean, ArithMean) = %s " % (i+1, tpch_throughput_numbers[i]), pstdout=True)
        if do_power and do_throughput:
            write_log("TPCH QPH Results",pstdout=True)
            for i in range(num_seqs):
                qph = math.exp((math.log(tpch_power_numbers[i][0]) + math.log(tpch_throughput_numbers[i][0]))*0.5)
                write_log("  TPCH QPH for Seq %s  = %s " % (i+1, qph), pstdout=True)
    
        if server_is_up():
            if leave_server_up: write_log("  Leave MySQL server up", pstdout=True)
            else: shutdown_server()
        else:
            write_log("  MySQL Server is down during the Run", pstdout=True)
            sys.exit(9);
    
    def run_tpch_all():
        #load_tpch()   # load the data
        write_log("Start TPC-H Benchmark for Run id: %s " % run_id)
        # if do_update:
        #     gen_rfs()
        for i in range(num_seqs):
            if do_power: run_tpch_power_test(i+1)
            if do_throughput: run_tpch_throughput_test(i+1)
            #start_up_server_if_needed()   # restart server
        print_results()
        write_log("End TPC-H Benchmark for Run id: %s " % run_id)
    #    pdb.set_trace()
    
    def parse_query_file(query_file):
        sys_call("sed -i 's/(3)//' %s" % (query_file))
        # sys_call("sed -i 's/limit -1/limit 1/' %s" % (query_file))
        sys_call("sed -i 's/count(o_orderkey)/count(o_orderkey) as c_count/' %s" % (query_file))
        sys_call("sed -i 's/) as c_orders (c_custkey, c_count)/) as c_orders/'  %s" % (query_file))
        sys_call("sed -i 's/drop view revenue0 limit 1;/) as c_orders/' %s" % (query_file))
    
    
        lines = [l.rstrip() for l in open(query_file,"r").readlines()]
        query_begin_re = re.compile("TPC-H/TPC-R .* Query \(Q([0-9]+)\)")
        query_end_re = re.compile("limit ([\-0-9]+);")
        # query_end_re = re.compile("^set rowcount ([\-0-9]+)")
        dbg_print("query_file = %s " % query_file)
        first_l=lines[0]
        query_begin = False
        queries=[]
        for l in lines:
            dbg_print("l= %s" % l)
            m = query_begin_re.search(l)
            if m:
                dbg_print("begin l= %s " % l)
                query_begin = True
                cur_qry=[]
                qry_name=m.group(1)
                cur_qry.append(first_l)
            if query_begin:
                m1 = query_end_re.search(l)
                if m1:
                    query_begin = False
                    row_count = int(m1.group(1))
                    if row_count > 0:
                        last_line = cur_qry[-1]
                        cur_qry[-1] =  last_line.split(";")[0] + " limit %s;" % row_count
                    queries.append((int(qry_name),"\n".join(cur_qry)))
                else:  cur_qry.append(l)
    
        if len(queries) != 22:
            print "queries = %s " % queries
            raise AssertionError, "There should be 22 queries,%s" % len(queries)
        return queries
    
    def conv_to_date(t):
        return time.strftime("%X %x",time.localtime(t))
    
    def open_connection():
        if (use_mysql_client): return None
        global tpch_conn
        tpch_conn = MySQLdb.connect (host="localhost", user="root", passwd="",db=tpch_db_name,unix_socket=mysock)
        if not tpch_conn:  raise AssertionError, "Connection failure"
        else: return tpch_conn
    
    def close_connection():
        if (use_mysql_client): return
        tpch_conn.close()
    
    def execute_one_query(seq, stream_id, qnum, q):
        # to test rf, do not run any query     e they do not pass
    
        if (test_only):
            time.sleep(0.1)
            return "Fake results\n"
        out_file = "%s/query_%s_s%s_q%s.log" % (output_dir,seq,stream_id,qnum)
        if (use_mysql_client):
            in_file = "%s/query_%s_s%s_q%s.sql" % (output_dir,seq,stream_id,qnum)
            open(in_file,"w").write(q)
            output = sys_pipe_call_1(mysql_cmd % in_file)
            open(out_file,"w").write(output)
        else:
            cursor = tpch_conn.cursor()
            cursor.execute(q)
            result = cursor.fetchall()
            cursor.close()
            output = "\n".join(["|".join([str(f) for f in r]) for r in result])
            open(out_file,"w").write(output)
        return output
    
    
    def run_query(seq, stream_id):
        ''' this is a wrapper. For now use mysql client. Later on can be a C client for query execution. Need to time each queries
            For now, satisfy the requirement to run all queries in a single session. Will try to use python connection
            may need to disable logging for throughput run
        '''
        local_log = "%s/query_%s_s%s.log" % (output_dir, seq, stream_id)
        dbg_print("local_log = %s" % local_log)
        line_header="Query Stream"
        queries=all_queries["%s_%s" % (seq,stream_id)]
        global qtime
        open_connection()
        for (qnum, q) in queries:
            write_log("%s %s seq %s qnum %s begin" % (line_header, stream_id, seq, qnum))
            write_log("%s %s seq %s qnum %s begin" % (line_header, stream_id, seq, qnum), local_log)
            start_time=get_time()
            output = execute_one_query(seq, stream_id, qnum, q)
            dbg_print("output = %s" % output)
            end_time = get_time()
            duration = end_time - start_time
            qtime.append((seq, stream_id, qnum, start_time, end_time, duration))
            write_log("%s seq %s stream %s qnum %s start time %s" % (line_header, seq, stream_id, qnum, start_time))
            write_log("%s seq %s stream %s qnum %s start time %s" % (line_header, seq, stream_id, qnum, start_time), local_log)
            open(local_log,"a+").write(output)
            write_log("%s seq %s stream %s qnum %s end time %s" % (line_header, seq, stream_id, qnum, end_time))
            write_log("%s seq %s stream %s qnum %s end time %s" % (line_header, seq, stream_id, qnum, end_time), local_log)
            write_log("%s seq %s stream %s qnum %s duration %s" % (line_header, seq, stream_id, qnum, duration))
            write_log("%s seq %s stream %s qnum %s duration %s" % (line_header, seq, stream_id, qnum, duration), local_log)
        close_connection()
    
    
    def run_rf(rf_id, seq, stream_id):
        ''' oracle load the update in a temp table and do insert as select. For now, just get the file and run it.
            insert into orders, lineitems (o_orderkey) '''
        global rftime
        if test_only:
            rftime.append((seq, stream_id, rf_id, 1, 2, 3))
        if not do_update:
            #rftime.append((seq, stream_id, rf_id, 101, 10))  # fake it
            return
        dbg_print("begin run_rf %s " % rf_id)
        tmp_file = rf_tmp_file_name_template % (rf_id,stream_id)
        local_log = "%s/rf%s_%s_s%s.log" % (output_dir, rf_id, seq, stream_id)
        line_header="RF %s" % rf_id
        write_log("%s seq %s stream %s begin" % (line_header, seq, stream_id))
        write_log("%s seq %s stream %s begin" % (line_header, seq, stream_id), local_log)
        start_time=get_time()
    
        output = sys_pipe_call_1(mysql_cmd % tmp_file)
    
        # if not test_only: output = sys_pipe_call_1(mysql_cmd % tmp_file)
        # else: output = "test only"
        #os.remove(tmp_file)
        end_time = get_time()
        duration = end_time - start_time
        write_log("%s seq %s stream %s start time %s" % (line_header, seq, stream_id, start_time), local_log)
        write_log("%s seq %s stream %s start time %s" % (line_header, seq, stream_id, start_time))
        # no need to print out result for rf, 
        open(local_log,"a+").write(output)
        write_log("%s seq %s stream %s end time %s" % (line_header, seq, stream_id, end_time))
        write_log("%s seq %s stream %s end time %s" % (line_header, seq, stream_id, end_time), local_log)
        write_log("%s seq %s stream %s duration %s" % (line_header, seq, stream_id, duration))
        write_log("%s seq %s stream %s duration %s" % (line_header, seq, stream_id, duration), local_log)
        rftime.append((seq, stream_id, rf_id, start_time, end_time, duration))
    
    def run_qgen(stream_id, seq):
        query_parameter_file="%s/qp_%s_s%s.log" % (output_dir, seq, stream_id)
        query_file= query_file_template % (output_dir, seq, stream_id)
        dbg_print("run_qgen begin")
        try:
            saved_cur_dir = os.getcwd()
        except OSError:
            saved_cur_dir=audit_dir
        os.chdir(dbgen_dir)
        if rand_seed:
            stream_rand_seed = long(rand_seed) + stream_id
            write_log("Generates query template file for streams %s with seed: %s " % (stream_id, stream_rand_seed))
            sys_call("%s -c -r %s -p %s -s %s -l %s > %s" % (qgen, stream_rand_seed, stream_id, scale_factor, query_parameter_file, query_file))
        else:
            write_log("Generates query template file for streams %s with seed: %s " % (stream_id, 'default'))
            sys_call("%s -c -d -p %s -s %s -l %s > %s" % (qgen, stream_id, scale_factor, query_parameter_file, query_file))
        line_header="Query Stream"
        queries = parse_query_file(query_file)
        global all_queries
        all_queries["%s_%s" % (seq,stream_id)] = queries
        # result files are in dbgen_dir/orders.tbl.u1 lineitem.tbl.u1. delete.1
        os.chdir(saved_cur_dir)
        dbg_print("run_qgen end")
    
    def run_tpch_power_test(seq=1):
        stream_id = 0   # power_test is stream 0
        global streams_time
        write_log("Start TPC-H Power test Run id: %s Seq: %s" % (run_id, seq))
        run_qgen(stream_id, seq) 
        stream_start_time =get_time()
        run_rf(1,seq, stream_id)
        run_query(seq, stream_id)
        run_rf(2, seq, stream_id)
        stream_end_time = get_time()
        stream_duration = stream_end_time - stream_start_time
        streams_time.append((seq, stream_id, stream_start_time, stream_end_time, stream_duration))
        write_log("End TPC-H Power test Run id: %s Seq: %s" % (run_id, seq))
        #dbg_print("qtime = %s " % qtime)
        global tpch_power_numbers
        compute_result(True, seq, 1, 0, tpch_power_numbers)
    
    class throughput_stream(Thread):
        (rp, wp) = os.pipe()
        def __init__ (self, seq, stream_id, stream_type='query'):
            Thread.__init__(self)
            self.seq = seq
            self.stream_id = stream_id
            self.stream_type=stream_type
        def run(self):
            global streams_time
            if self.stream_type == 'query':
                stream_start_time =get_time()
                run_query(self.seq, self.stream_id)
                stream_end_time = get_time()
                stream_duration = stream_end_time - stream_start_time
                streams_time.append((self.seq, self.stream_id, stream_start_time, stream_end_time, stream_duration))
                os.write(self.wp,"1")
            else:
                for i in range(num_streams_scale[scale_factor]):
                    os.read(self.rp,1)
                write_log("RF unblocking")
                stream_start_time =get_time()
                for i in range(num_streams_scale[scale_factor]):
                    run_rf(1,self.seq, i+1)
                    run_rf(2, self.seq, i+1)
                stream_end_time = get_time()
                stream_duration = stream_end_time - stream_start_time
                streams_time.append((self.seq, self.stream_id, stream_start_time, stream_end_time, stream_duration))
    
    def run_tpch_throughput_test(seq=1):
        ''' number of query streams run concurrently with one refresh stream. sequential run rf1, rf2 for all the streams
            Oracle waited all the query streams to finish before start rf.
            But MS start rf same time as query streams'''
        write_log("Start TPC-H Through Put test Run id: %s Seq: %s" % (run_id, seq))
        num_streams = num_streams_scale[scale_factor]
        t_streams=[]
        for s in range(num_streams):
            run_qgen(s+1, seq)
            t_streams.append(throughput_stream(seq, s+1, 'query'))
        #for s in range(num_streams):
        t_streams.append(throughput_stream(seq, 1001, 'rf'))
        #c = raw_input("continue? (Y/N)").upper()
        throughput_start_time=get_time()
        if (rf_throughput_concurrent):
            for t in t_streams: t.start()    
            for t in t_streams: t.join()
        else:
            global streams_time
            stream_start_time =get_time()
            for t in t_streams: t.start()
            for t in t_streams: t.join()
    
            #for t in t_streams[0:num_streams]: t.start()   # start the query first
            #for t in t_streams[0:num_streams]: t.join()
            #for s in range(num_streams):
                #t_streams[num_streams + s].start()   # start the rf last
                #t_streams[num_streams + s].join()
            #stream_end_time = get_time()
            #stream_duration = stream_end_time - stream_start_time
            #streams_time.append((seq, 1001, stream_start_time, stream_end_time, stream_duration))
        throughput_end_time=get_time()
        throughput_duration = throughput_end_time - throughput_start_time
        global tpch_throughput_numbers
        compute_result(False, seq, num_streams, throughput_duration,tpch_throughput_numbers)
        write_log("End TPC-H Through Put test Run id: %s, Seq: %s, Scale: %s, Number of Streams: %s" % (run_id, seq, scale_factor, num_streams))
        write_log("Through Put duration  = %s" % (throughput_duration)) 
    
    #---------------------------
    def server_is_up():
        ret = sys_pipe_call_1("echo exit | mysql --sock=%s" % mysock)
        #print "ret = %s " % ret
    #    return True
        return not re.compile("Can't connect to local MySQL server through socket").search(ret)
    
    
    def start_up_server_if_needed():
        # test if the server is up
        #if test_only: return  # // dzhang_test 
        if not server_is_up():
            write_log ("Cannot connect to MySQL server. Trying to start mysqld!", pstdout=True)
            global master_err_file_handle, master_err_file
            #master_err_file = "%s/var/log/master.err" % mysql_test_dir
            master_err_file = "%s/master.err" % output_dir
            #c2_config_file = "%s/c2_config.cnf" % audit_dir   # enable this in real audit
            c2_config_file = "%s/c2_config.cnf" % mysql_test_dir
            # work around bug 1687, remove log and control files
            #c2_log_dir="/log/databases/tpch"   # this will be defined in c2_config_file
            #sys_call("rm %s/*" % c2_log_dir)
            sys_call ("%s/../sql/mysqld  --core-file --basedir=%s/.. --datadir=%s/var/master-data --language=%s/../sql/share/english --c2db_event_log_level=ERROR --socket=%s --c2db-config-file=%s>& %s &"
                  % (mysql_test_dir,mysql_test_dir,mysql_test_dir, mysql_test_dir, mysock, c2_config_file, master_err_file ))
            max_wait_sec = 10
            for i in range(max_wait_sec+1):
                if server_is_up():
                    write_log("Mysqld started successfully on sock %s" % mysock,pstdout = True)
                    break
                time.sleep(2)
            if (i == max_wait_sec): raise AssertionError, "Time out waiting for mysql server to start\n"
            master_err_file_handle = open(master_err_file)
            if not master_err_file_handle: raise AssertionError, "master.err is not created\n"
        else:
            write_log("MySQL server is up. Use the existing one and continue", pstdout=True)
    
    def shutdown_server():
        if not server_is_up(): return
        dbg_print("Begin shutdown sever")
        sys_call("mysqladmin -u root -p123456 -h127.0.0.1 -S %s shutdown" % mysock)
        max_wait_sec = 20
        for i in range(max_wait_sec+1):
            if not server_is_up():
                write_log("Mysqld is shutdown successfully on sock %s" % mysock, pstdout=True)
                break
            time.sleep(2)
            if (i == max_wait_sec): raise AssertionError, "Time out waiting for mysql server to shutdown\n"
    
    def main(argv):
        ''' may also need to specify seed, mysock. Also may need to take care of loading
        '''
        # default 
        tpch_tables=('customer', 'supplier', 'nation', 'region', 'part', 'partsupp', 'orders', 'lineitem')
        global _debug, scale_factor, do_power, do_throughput, do_update, tpch_db_name, test_only, do_loader, run_id, rand_seed, num_seqs, update_only, no_cleanup, leave_server_up
    
        os.environ['TZ'] = 'US/Pacific'
        time.tzset()
    
        try:                                
           opts, args = getopt.gnu_getopt(argv, "hdq:s:r:ptule:", ["help", "debug","query_program=", "scale=", "root=", "power", "throughput", "update","db=","test_only","loader","run_id=","rand_seed=","iter=","gen_rfs","update_only","no_cleanup","leave_server_up"])
        except getopt.GetoptError:          
            usage()                         
            sys.exit(2)                     
        for opt, arg in opts:               
            if opt in ("-h", "--help"):     
                usage()                     
                sys.exit(0)                  
            elif opt == '-d':               
                _debug = 1                  
            elif opt in ("-q", "--query_program"):
                query_exec = arg
            elif opt in ("-s", "--scale"):
                scale_factor = int(arg)
            elif opt in ("-r", "--root"):
                audit_dir= arg
            elif opt in ("-p", "--power"):
                do_power = True
            elif opt in ("-t", "--throughput"):
                do_throughput = True
            elif opt in ("-u", "--update"):
                do_update = True
            elif opt in ("-l", "--loader"):
                do_loader = True
                #print "do update"
            elif opt in ("-e", "--rand_seed"):
                rand_seed = int(arg)
            elif opt in ("--test_only"):
                test_only = True
            elif opt in ("--db"):
                tpch_db_name = arg
            elif opt in ("--run_id"):
                run_id = arg
            elif opt in ("--iter"):
                num_seqs = int(arg)
            elif opt in ("--update_only"):
                update_only = True
                do_update = True
            elif opt in ("--no_cleanup"):
                no_cleanup = True
            elif opt in ("--leave_server_up"):
                leave_server_up = True
        setup_tpch()
        run_tpch_all()
    
    if __name__ == "__main__":
        main(sys.argv[1:])
    
    
    
    展开全文
  • TPCH测试工具

    千次阅读 2019-06-09 22:03:00
    TPC现有的测试标准为:TPC-E、TPC-C、TPC-H、TPC-App。根据这4个测试基准,目前TPC主要包括的4个技术小组委员会:TPC-E 技术小组委员会、TPC-C 技术小组委员会、TPC-H技术小组委员会、TPC-App技术小组委员会。前期...

    TPC现有的测试标准为:TPC-E、TPC-C、TPC-H、TPC-App。根据这4个测试基准,目前TPC主要包括的4个技术小组委员会:TPC-E 技术小组委员会、TPC-C 技术小组委员会、TPC-H技术小组委员会、TPC-App技术小组委员会。前期TPC使用过但目前已经停止使用的测试标准有:TPC-A、TPC-B(数据库处理能力测试标准)、TPC-D、TPC-R(决策支持系统测试标准,类TPC-H)、TPC-W(Web处理能力测试标准)。

    TPC测试标准分类
    联机在线事务处理系统(OLTP)测试标准:TPC-C、TPC-E(最新)
    决策支持/大数据(DS)测试标准:TPC-H、TPC-DS(最新)
    服务器虚拟化(VMS)测试标准:TPC-VMS
    OLTP测试标准
    由于数据库的应用一般有两种,一种是OLTP,即在线联机事务处理,另一种是数据挖掘。就目前来说,OLTP仍然是主流应用。所以从一定程度来说,TPC-C和TPC-E的结果对于数据库系统采购都有一定的参考价值,比如银行、证券、税务报税系统、电子商务网站、电信业务等都是比较典型的OLTP应用。

    TPC-H
    TPC-H(商业智能计算测试) 是美国交易处理效能委员会(TPC,Transaction Processing Performance Council) 组织制定的用来模拟决策支持类应用的一个测试集.目前,在学术界和工业界普遍采用它来评价决策支持技术方面应用的性能. 这种商业测试可以全方位评测系统的整体商业计算综合能力,对厂商的要求更高,同时也具有普遍的商业实用意义,目前在银行信贷分析和信用卡分析、电信运营分析、税收分析、烟草行业决策分析中都有广泛的应用。

    TPC-H 基准测试是由 TPC-D(由 TPC 组织于 1994 年指定的标准,用于决策支持系统方面的测试基准)发展而来的.TPC-H 用 3NF 实现了一个数据仓库,共包含 8 个基本关系,其数据量可以设定从 1G~3T 不等。TPC-H 基准测试包括 22 个查询(Q1~Q22),其主要评价指标是各个查询的响应时间,即从提交查询到结果返回所需时间.TPC-H 基准测试的度量单位是每小时执行的查询数( QphH@size),其中 H 表示每小时系统执行复杂查询的平均次数,size 表示数据库规模的大小,它能够反映出系统在处理查询时的能力.TPC-H 是根据真实的生产运行环境来建模的,这使得它可以评估一些其他测试所不能评估的关键性能参数.总而言之,TPC 组织颁布的TPC-H 标准满足了数据仓库领域的测试需求,并且促使各个厂商以及研究机构将该项技术推向极限。


    ---------------------
    https://blog.csdn.net/xfg0218/article/details/82785187

     TPC-H 基准测试是由 TPC-D(由 TPC 组织于 1994 年指定的标准,用于决策支持系统方面的测试基准)发展而来的.TPC-H 用 3NF 实现了一个数据仓库,共包含 8 个基本关系,其数据量可以设定从 1G~3T 不等。TPC-H 基准测试包括 22 个查询(Q1~Q22),其主要评价指标是各个查询的响应时间,即从提交查询到结果返回所需时间.TPC-H 基准测试的度量单位是每小时执行的查询数( QphH@size),其中 H 表示每小时系统执行复杂查询的平均次数,size 表示数据库规模的大小,它能够反映出系统在处理查询时的能力.TPC-H 是根据真实的生产运行环境来建模的,这使得它可以评估一些其他测试所不能评估的关键性能参数.总而言之,TPC 组织颁布的TPC-H 标准满足了数据仓库领域的测试需求,并且促使各个厂商以及研究机构将该项技术推向极限。

    (*)详细介绍:https://blog.csdn.net/woshisap/article/details/44427899
    ---------------------

    TPCH 里的表格对一个配件销售管理系统进行建模。细节参考 TPCH Specification
    TPCH 的 Query 定义参考 Specification 的 2.4 节,里面详细说明了每个 Query 的内涵。本文过一遍表定义,记住这些表的意思,有利于理解 Query。

    suplier
    供应商信息:key + 姓名、地址、电话、国家代码
    其中国家代码需要和 nation 表做 join 以获得详细国家信息。

    CREATE TABLE `supplier` (
    `s_suppkey` bigint(20) NOT NULL,
    `s_name` char(25) DEFAULT NULL,
    `s_address` varchar(40) DEFAULT NULL,
    `s_nationkey` bigint(20) DEFAULT NULL,
    `s_phone` char(15) DEFAULT NULL,
    `s_acctbal` bigint(20) DEFAULT NULL,
    `s_comment` varchar(101) DEFAULT NULL,
    PRIMARY KEY (`s_suppkey`)
    ) partition by hash(s_suppkey) partitions 128
    1
    nation
    国家信息:key + 国家名、地区代号
    其中地区代号需要和 region 表做 join 以获得地区详细信息

    CREATE TABLE `nation` (
    `n_nationkey` bigint(20) NOT NULL,
    `n_name` char(25) DEFAULT NULL,
    `n_regionkey` bigint(20) DEFAULT NULL,
    `n_comment` varchar(152) DEFAULT NULL,
    PRIMARY KEY (`n_nationkey`)
    )
    region
    地区信息:key + 地区名

    CREATE TABLE `region` (
    `r_regionkey` bigint(20) NOT NULL,
    `r_name` char(25) DEFAULT NULL,
    `r_comment` varchar(152) DEFAULT NULL,
    PRIMARY KEY (`r_regionkey`)
    )

    customer
    用户表:key + 姓名、地址、国家代号、电话等
    用户表按照 key 做 64 个 hash 分区

    CREATE TABLE `customer` (
    `c_custkey` bigint(20) NOT NULL,
    `c_name` varchar(25) DEFAULT NULL,
    `c_address` varchar(40) DEFAULT NULL,
    `c_nationkey` bigint(20) DEFAULT NULL,
    `c_phone` char(15) DEFAULT NULL,
    `c_acctbal` decimal(10,2) DEFAULT NULL,
    `c_mktsegment` char(10) DEFAULT NULL,
    `c_comment` varchar(117) DEFAULT NULL,
    PRIMARY KEY (`c_custkey`),
    UNIQUE KEY `i_c_custkey` (`c_custkey`) LOCAL BLOCK_SIZE 16384
    ) partition by hash(c_custkey) partitions 64
    part
    配件表:key + 配件名、厂商、品牌、类型、大小、包装、零售价

    CREATE TABLE `part` (
    `p_partkey` bigint(20) NOT NULL,
    `p_name` varchar(55) DEFAULT NULL,
    `p_mfgr` char(25) DEFAULT NULL,
    `p_brand` char(10) DEFAULT NULL,
    `p_type` varchar(25) DEFAULT NULL,
    `p_size` bigint(20) DEFAULT NULL,
    `p_container` char(10) DEFAULT NULL,
    `p_retailprice` decimal(10,2) DEFAULT NULL,
    `p_comment` varchar(23) DEFAULT NULL,
    PRIMARY KEY (`p_partkey`)
    ) partition by hash(p_partkey) partitions 64;

    partsupp
    配件供应表:配件key + 供应商key + 供应数量、批发价

    CREATE TABLE `partsupp` (
    `ps_partkey` bigint(20) NOT NULL,
    `ps_suppkey` bigint(20) NOT NULL,
    `ps_availqty` bigint(20) DEFAULT NULL,
    `ps_supplycost` decimal(10,2) DEFAULT NULL,
    `ps_comment` varchar(199) DEFAULT NULL,
    PRIMARY KEY (`ps_partkey`, `ps_suppkey`),
    UNIQUE KEY `ps_pkey_skey` (`ps_partkey`, `ps_suppkey`) LOCAL BLOCK_SIZE 16384
    ) partition by hash(ps_partkey) partitions 64

    orders
    零售订单表:订单key + 客户key + 订单状态、订单总价、下单日期、优先级、收银员、发货优先级

    CREATE TABLE `orders` (
    `o_orderkey` bigint(20) NOT NULL,
    `o_custkey` bigint(20) NOT NULL,
    `o_orderstatus` char(1) DEFAULT NULL,
    `o_totalprice` decimal(10,2) DEFAULT NULL,
    `o_orderdate` date NOT NULL,
    `o_orderpriority` char(15) DEFAULT NULL,
    `o_clerk` char(15) DEFAULT NULL,
    `o_shippriority` bigint(20) DEFAULT NULL,
    `o_comment` varchar(79) DEFAULT NULL,
    PRIMARY KEY (`o_orderkey`, `o_orderdate`, `o_custkey`),
    KEY `o_orderkey` (`o_orderkey`) LOCAL BLOCK_SIZE 16384
    ) partition by range columns(o_orderdate)
    subpartition by hash(o_custkey) subpartitions 64
    (partition ord1 values less than ('1992-01-01'),
    partition ord2 values less than ('1992-02-01'),
    partition ord3 values less than ('1992-03-01'),
    ...
    ...
    partition ord77 values less than ('1998-05-01'),
    partition ord78 values less than ('1998-06-01'),
    partition ord79 values less than ('1998-07-01'),
    partition ord80 values less than ('1998-08-01'),
    partition ord81 values less than (MAXVALUE))

    lineitem
    订单明细表:订单key + 配件key + 供应商key + 流水号、数量、价格、折扣、税、明细状态、发货日期、预计到达日期、实际到达日期、运单处理策略(原返?拒收退回?等)、运输途径(火车、汽运、邮寄等)

    CREATE TABLE `lineitem` (
    `l_orderkey` bigint(20) NOT NULL,
    `l_partkey` bigint(20) NOT NULL,
    `l_suppkey` bigint(20) NOT NULL,
    `l_linenumber` bigint(20) NOT NULL,
    `l_quantity` bigint(20) NOT NULL,
    `l_extendedprice` decimal(10,2) NOT NULL,
    `l_discount` decimal(10,2) NOT NULL,
    `l_tax` decimal(10,2) NOT NULL,
    `l_returnflag` char(1) DEFAULT NULL,
    `l_linestatus` char(1) DEFAULT NULL,
    `l_shipdate` date NOT NULL,
    `l_commitdate` date DEFAULT NULL,
    `l_receiptdate` date DEFAULT NULL,
    `l_shipinstruct` char(25) DEFAULT NULL,
    `l_shipmode` char(10) DEFAULT NULL,
    `l_comment` varchar(44) DEFAULT NULL,
    PRIMARY KEY (`l_orderkey`, `l_linenumber`, `l_shipdate`, `l_partkey`),
    KEY `i_l_orderkey` (`l_orderkey`) LOCAL BLOCK_SIZE 16384
    ) partition by range columns(l_shipdate)
    subpartition by hash(l_partkey) subpartitions 64
    (partition item1 values less than ('1992-01-01'),
    partition item2 values less than ('1992-02-01'),
    partition item3 values less than ('1992-03-01'),
    partition item4 values less than ('1992-04-01'),
    ...
    ...
    partition item81 values less than ('1998-09-01'),
    partition item82 values less than ('1998-10-01'),
    partition item83 values less than ('1998-11-01'),
    partition item84 values less than (MAXVALUE))
    1
    总体上看,围绕销售数据可以做各种进销存查询,围绕客户数据,可以做各种销售分析。更多 BI 相关内容,参考 http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.3.pdf


    ---------------------
    作者:maray
    来源:CSDN
    原文:https://blog.csdn.net/maray/article/details/80450990
    版权声明:本文为博主原创文章,转载请附上博文链接!

    转载于:https://www.cnblogs.com/klb561/p/10995086.html

    展开全文
  • MySQL_tpch测试工具简要手册
  • Greenplum使用TPC-H测试过程及结果

    千次阅读 2018-09-20 13:10:03
    5 汇总TPC-H测试结果 5.1 表的数量汇总 5.2 执行SQL耗时汇总 6 分析执行的SQL 6.1 SQL执行过程分析 6.1.1 价格摘要报告查询(Q1)   6.1.2 最小代价供应者查询(Q2) 6.1.3 运送...
    
    

     

    TPC-H 介绍

           TPC-H 基准测试是由 TPC-D(由 TPC 组织于 1994 年指定的标准,用于决策支持系统方面的测试基准)发展而来的.TPC-H 用 3NF 实现了一个数据仓库,共包含 8 个基本关系,其数据量可以设定从 1G~3T 不等。TPC-H 基准测试包括 22 个查询(Q1~Q22),其主要评价指标是各个查询的响应时间,即从提交查询到结果返回所需时间.TPC-H 基准测试的度量单位是每小时执行的查询数( QphH@size),其中 H 表示每小时系统执行复杂查询的平均次数,size 表示数据库规模的大小,它能够反映出系统在处理查询时的能力.TPC-H 是根据真实的生产运行环境来建模的,这使得它可以评估一些其他测试所不能评估的关键性能参数.总而言之,TPC 组织颁布的TPC-H 标准满足了数据仓库领域的测试需求,并且促使各个厂商以及研究机构将该项技术推向极限。

    (*)详细介绍:https://blog.csdn.net/woshisap/article/details/44427899

    1 Greenplum集群硬件信息

    硬件信息行查看Greenplum主节点与数据节点硬件详细信息

    2 Greenplum 集群架构信息

    3 TPC-H 软件环境搭建

    参考资料:https://yq.aliyun.com/articles/93?spm=a2c4e.11153940.blogcont98613.10.6e386c3eB5OOZf&commentId=29

    3.1 TPC-H软件下载

    下载地址:链接: https://pan.baidu.com/s/1RKdgfRBvAzZ-yllXO15fng 提取码: p781

    3.2 编译TPC-H软件

    编译软件参考:https://github.com/digoal/pg_tpch?spm=a2c4e.11153940.blogcont93.5.df104cd2p9viqS

    3.2.1 给文件添加执行权限

    #cd dbgen 

    #chmod  777 *.c *.h

    3.2.2 生成Makefile文件

    # cp makefile.suite Makefile

    3.2.3 Makefile文件

    103 CC = gcc
    104 # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
    105 # SQLSERVER, SYBASE, ORACLE, VECTORWISE
    106 # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS, 
    107 # SGI, SUN, U2200, VMS, LINUX, WIN32 
    108 # Current values for WORKLOAD are: TPCH

    109 DATABASE= ORACLE
    110 MACHINE = LINUX
    111 WORKLOAD = TPCH

    3.2.4 编译TPC-H

    # make
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o build.o build.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o driver.o driver.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bm_utils.o bm_utils.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rnd.o rnd.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o print.o print.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o load_stub.o load_stub.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bcd2.o bcd2.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o speed_seed.o speed_seed.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o text.o text.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o permute.o permute.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rng64.o rng64.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o qgen.o qgen.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o varsub.o varsub.c
    gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lm

    3.2.5 TPC-H查看帮助

    # ./dbgen --help
    ./dbgen: invalid option -- '-'
    ERROR: option '-' unknown.
    TPC-H Population Generator (Version 2.14.3 build 0)
    Copyright Transaction Processing Performance Council 1994 - 2010
    USAGE:
    dbgen [-{vf}][-T {pcsoPSOL}]
    [-s <scale>][-C <procs>][-S <step>]
    dbgen [-v] [-O m] [-s <scale>] [-U <updates>]

    Basic Options
    ===========================
    -C <n> -- separate data set into <n> chunks (requires -S, default: 1)
    -f -- force. Overwrite existing files
    -h -- display this message
    -q -- enable QUIET mode
    -s <n> -- set Scale Factor (SF) to <n> (default: 1) 
    -S <n> -- build the <n>th step of the data/update set (used with -C or -U)
    -U <n> -- generate <n> update sets
    -v -- enable VERBOSE mode

    Advanced Options
    ===========================
    -b <s> -- load distributions for <s> (default: dists.dss)
    -d <n> -- split deletes between <n> files (requires -U)
    -i <n> -- split inserts between <n> files (requires -U)
    -T c -- generate cutomers ONLY
    -T l -- generate nation/region ONLY
    -T L -- generate lineitem ONLY
    -T n -- generate nation ONLY
    -T o -- generate orders/lineitem ONLY
    -T O -- generate orders ONLY
    -T p -- generate parts/partsupp ONLY
    -T P -- generate parts ONLY
    -T r -- generate region ONLY
    -T s -- generate suppliers ONLY
    -T S -- generate partsupp ONLY

    To generate the SF=1 (1GB), validation database population, use:
    dbgen -vf -s 1

    To generate updates for a SF=1 (1GB), use:
    dbgen -v -U 1 -s 1

     

    以上常用的参数为:

    -s 使用dbgen产生一些测试数据, -s 表示scale(单位为GB),根据需要测试的数据量指定:

    4  Greenplum测试步骤

    4.1 生成测试数据

    4.1.1 生成50G第一批测试数据

    $ time ./dbgen -s 50 -f
    TPC-H Population Generator (Version 2.14.3)
    Copyright Transaction Processing Performance Council 1994 - 2010

    real 23m8.674s
    user 22m8.985s
    sys 0m59.560s

     

    以上命令式生成50G的测试数据,用时23m

     

    50G测试数据的分布情况

    $ ll -h *.tbl
    -rw-rw-r-- 1 gpadmin gpadmin 1.2G Sep 19 17:18 customer.tbl
    -rw-rw-r-- 1 gpadmin gpadmin 37G Sep 19 17:18 lineitem.tbl
    -rw-rw-r-- 1 gpadmin gpadmin 2.2K Sep 19 17:18 nation.tbl
    -rw-rw-r-- 1 gpadmin gpadmin 8.3G Sep 19 17:18 orders.tbl
    -rw-rw-r-- 1 gpadmin gpadmin 5.7G Sep 19 17:18 partsupp.tbl
    -rw-rw-r-- 1 gpadmin gpadmin 1.2G Sep 19 17:18 part.tbl
    -rw-rw-r-- 1 gpadmin gpadmin 389 Sep 19 17:18 region.tbl
    -rw-rw-r-- 1 gpadmin gpadmin 69M Sep 19 17:18 supplier.tbl

    4.1.2 生成200G第二批测试数据

    $ time ./dbgen -s 200 -f
    TPC-H Population Generator (Version 2.14.3)
    Copyright Transaction Processing Performance Council 1994 - 2010

    real 92m56.905s
    user 88m27.882s
    sys 4m28.157s

    以上命令式生成200G的测试数据,用时92m56.905s

    4.2 查看生成的测试数据

    4.2.1 查看每个表的含义信息

    表的含义如下说明,详细请查看:https://blog.csdn.net/woshisap/article/details/44427899

    4.2.2 查看消费者(customer)信息

    $ head -n 2 customer.tbl 
    1|Customer#000000001|IVhzIApeRb ot,c,E|15|25-989-741-2988|711.56|BUILDING|to the even, regular platelets. regular, ironic epitaphs nag e|
    2|Customer#000000002|XSTf4,NCwDVaWNe6tEgvwfmRchLXak|13|23-768-687-3665|121.65|AUTOMOBILE|l accounts. blithely ironic theodolites integrate boldly: caref|

    ******

    4.2.3 查看零件(part)信息

    $ head -n 2 part.tbl 
    1|goldenrod lavender spring chocolate lace|Manufacturer#1|Brand#13|PROMO BURNISHED COPPER|7|JUMBO PKG|901.00|ly. slyly ironi|
    2|blush thistle blue yellow saddle|Manufacturer#1|Brand#13|LARGE BRUSHED BRASS|1|LG CASE|902.00|lar accounts amo|

    *********

    4.2.4 查看供货商(supplier)信息

    $ head -n 2 supplier.tbl 
    1|Supplier#000000001| N kD4on9OM Ipw3,gf0JBoQDd7tgrzrddZ|17|27-918-335-1736|5755.94|each slyly above the careful|
    2|Supplier#000000002|89eJ5ksX3ImxJQBvxObC,|5|15-679-861-2259|4032.68| slyly bold instructions. idle dependen|

    ************

    4.2.5 查看供货商的零件(partsupp)信息

    $ head -n 2 partsupp.tbl 
    1|2|3325|771.64|, even theodolites. regular, final theodolites eat after the carefully pending foxes. furiously regular deposits sleep slyly. carefully bold realms above the ironic dependencies haggle careful|
    1|125002|8076|993.49|ven ideas. quickly even packages print. pending multipliers must have to are fluff|

    *********

    4.2.6 查看查看订单(orders)的信息

    $ head -n 2 orders.tbl 
    1|1845001|O|218611.01|1996-01-02|5-LOW|Clerk#000047528|0|nstructions sleep furiously among |
    2|3900082|O|59659.27|1996-12-01|1-URGENT|Clerk#000043958|0| foxes. pending accounts at the pending, silent asymptot|

    ***********

    4.2.7 查看在线商品(lineitem)的信息

    $ head -n 2 lineitem.tbl 
    1|7759468|384484|1|17|25960.36|0.04|0.02|N|O|1996-03-13|1996-02-12|1996-03-22|DELIVER IN PERSON|TRUCK|egular courts above the|
    1|3365454|365455|2|36|54694.44|0.09|0.06|N|O|1996-04-12|1996-02-28|1996-04-20|TAKE BACK RETURN|MAIL|ly final dependencies: slyly bold |

    **********

    4.2.8 查看国家的(nation)信息

    $ head -n 2 nation.tbl 
    0|ALGERIA|0| haggle. carefully final deposits detect slyly agai|
    1|ARGENTINA|1|al foxes promise slyly according to the regular accounts. bold requests alon|

    4.2.9 查看地区(region)的信息

    $ head -n 2 region.tbl 
    0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
    1|AMERICA|hs use ironic, even requests. s|

    4.3 创建测试表与加载测试数据(50G测试数据)

    4.3.1 下载创建于查询SQL

    下载创建于查询SQL:   tpc-h测试SQL.zip

    4.3.2 查看创建的表

    4.3.2 加载测试数据

    4.3.2.1 转化测试数据的格式

    #  for i in `ls *.tbl`; do sed 's/|$//' $i > ${i/tbl/csv}; done

    4.3.2.2 删除tbl文件

    #rm -rf *.tbl

    4.3.2.3 加载数据脚本

    # cat file-export-greenplum.sh 
    #!bin/bash

    time psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "COPY tpc_h_test.part FROM '/greenplum/****/part.csv' WITH csv DELIMITER '|'";
    time psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "COPY tpc_h_test.region FROM '/greenplum/****/region.csv' WITH csv DELIMITER '|'";
    time psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "COPY tpc_h_test.nation FROM '/greenplum/****/nation.csv' WITH csv DELIMITER '|'";
    time psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "COPY tpc_h_test.supplier FROM '/greenplum/****/supplier.csv' WITH csv DELIMITER '|'";
    time psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "COPY tpc_h_test.customer FROM '/greenplum/****/customer.csv' WITH csv DELIMITER '|'";
    time psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "COPY tpc_h_test.partsupp FROM '/greenplum/****/partsupp.csv' WITH csv DELIMITER '|'";
    time psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "COPY tpc_h_test.orders FROM '/greenplum/****/orders.csv' WITH csv DELIMITER '|'";
    time psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "COPY tpc_h_test.lineitem FROM '/greenplum/****/lineitem.csv' WITH csv DELIMITER '|'";

    4.3.2.4 查看每个表的个数

    shell脚本实例

    $ vi table-count.sh

    #!bin/bash
    psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "select count(*) part from tpc_h_test.part;";
    psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "select count(*) region from tpc_h_test.region";
    psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "select count(*) nation from tpc_h_test.nation";
    psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "select count(*) supplier from tpc_h_test.supplier";
    psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "select count(*) customer from tpc_h_test.customer";
    psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "select count(*) partsupp from tpc_h_test.partsupp";
    psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "select count(*) orders from tpc_h_test.orders";
    psql -d stagging -h 192.****.11 -p 5432 -U gpadmin -c "select count(*) lineitem from tpc_h_test.lineitem";

     

    查看个数

    $ sh table-count.sh
    part 
    ----------
    10000000
    (1 row)

    region 
    --------
    5
    (1 row)

    nation 
    --------
    25
    (1 row)

    supplier 
    ----------
    500000
    (1 row)

    customer 
    ----------
    7500000
    (1 row)

    partsupp 
    ----------
    40000000
    (1 row)

    orders 
    ----------
    75000000
    (1 row)

    lineitem 
    -----------
    300005811
    (1 row)

    4.3.2.5 查看表的分布情况

    4.3.2.5.1 customer 表的分布情况

    4.3.2.5.2 part表的分布情况

    4.3.2.5.3 region表的分布情况

    4.3.2.5.4 nation表的分布情况

    4.3.2.5.5 supplier表的分布情况

    4.3.2.5.6 partsupp表的分布情况

    4.3.2.5.7 orders表的分布情况

    4.3.2.5.8 lineitem表的分布情况

    4.4 执行测试SQL并查看测试效果(50G)

    查询SQL下载: tpc-h测试SQL.zip

    4.4.1 价格摘要报告查询(Q1)

    用时17.877s

    4.4.2 最小代价供应者查询(Q2)

     

    用时4.160s

    4.4.3 运送优先权查询(Q3)

    用时9.581s

    4.4.4 订单优先权检查查询(Q4)

    这个查询没有匹配到数据,用时0.032s

    4.4.5 当地供应者数量查询(Q5)

    这个查询没有匹配到数据,用时0.049s

    4.4.6 预测收入变化查询(Q6)

    这个查询没有匹配到数据,用时0.026s

    4.4.7 货运量查询(Q7)

    用时6.882s

    4.4.8 国家市场份额查询(Q8)

    用时6.261s

    4.4.9 国家市场份额查询(Q9)

    用时21.678s

    4.4.10 返回项目报告查询(Q10)

    这个查询没有匹配到数据,用时0.041s

    4.4.11 重要库存标志查询(Q11)

    这个查询没有匹配到数据,用时1.864s

    4.4.12 货运模式和命令优先查询(Q12)

    这个查询没有匹配到数据,用时0.072s

    4.4.13 消费者分配查询(Q13)

    用时4.080s

    4.4.14 促进效果查询(Q14)

    这个查询没有匹配到数据,用时0.041s

    4.4.15 头等(Q15)

    这个查询没有匹配到数据,用时0.481s

    4.4.16 零件/供应商关系查询(Q16)

    用时2.676s

    4.4.17 小量订单收入查询(Q17)

    用时21.901s

    4.4.18 大订单顾客查询(Q18)

    用时29.006s

    4.4.19 折扣收入查询(Q19)

    用时3.385s

     

    4.4.20 潜在零件促进查询(Q20)

    这个查询没有匹配到数据,用时1.739s

    4.4.21 不能按时交货供应商查询(Q21)

    用时30.719s

    4.4.22 全球销售机会查询(Q22)

    用时4.104s

     

    4.5 执行测试SQL并查看测试效果(200G)

    查询SQL下载: tpc-h测试SQL.zip

    4.5.1 价格摘要报告查询(Q1)

    用时69.827s

    4.5.2 最小代价供应者查询(Q2)

    用时9.911s

    4.5.3 运送优先权查询(Q3)

    用时36.624s

    4.5.4 订单优先权检查查询(Q4)

    这个查询没有匹配到数据,用时0.057s

    4.5.5 当地供应者数量查询(Q5)

    这个查询没有匹配到数据,用时0.059s

    4.5.6 预测收入变化查询(Q6)

    这个查询没有匹配到数据,用时0.029s

    4.5.7 货运量查询(Q7)

    用时22.081s

    4.5.8 国家市场份额查询(Q8)

    用时19.905s

    4.5.9 国家市场份额查询(Q9)

    用时58.924s

    4.5.10 返回项目报告查询(Q10)

    这个查询没有匹配到数据,用时0.040s

    4.5.11 重要库存标志查询(Q11)

    这个查询没有匹配到数据,用时3.393s

    4.5.12 货运模式和命令优先查询(Q12)

    这个查询没有匹配到数据,用时0.072s

    4.5.13 消费者分配查询(Q13)

    用时16.432s

    4.5.14 促进效果查询(Q14)

     

    这个查询没有匹配到数据,用时0.039s

    4.5.15 头等(Q15)

    这个查询没有匹配到数据,用时0.303s

    4.5.16 零件/供应商关系查询(Q16)

    用时10.648s

    4.5.17 小量订单收入查询(Q17)

    用时94.435s

    4.5.18 大订单顾客查询(Q18)

    用时118.068s

    4.5.19 折扣收入查询(Q19)

    用时8.773s

    4.5.20 潜在零件促进查询(Q20)

    这个查询没有匹配到数据,用时4.005s

    4.5.21 不能按时交货供应商查询(Q21)

    用时85.615s

    4.5.22 全球销售机会查询(Q22)

    用时13.140s

    5 汇总TPC-H测试结果

    5.1 表的数量汇总

        在以上的图表中可以看出50G数据时,数据量最多的是lineitem表,达到300005811行的数据,最少的是region表,数据量5行

    其中200G数据时,数据量最多的是lineitem表,达到12,0001,8434行的数据,最少的是region表,数据量5行。

        表中的数据请查看 4.2 查看生成的测试数据

    5.2 执行SQL耗时汇总

     

    在以上的图表中可以看出,50G数据时耗时最多的是Q21,耗时30.719s, 0s代表查询没有匹配的数据

    其中200G数据时耗时最多的是Q18,耗时118.068s , 0s代表查询没有匹配的数据。

    6 分析执行的SQL

    6.1 SQL执行过程分析

    6.1.1 价格摘要报告查询(Q1)

    6.1.2 最小代价供应者查询(Q2)

    6.1.3 运送优先权查询(Q3)


    6.1.4 订单优先权检查查询(Q4)


    6.1.5 当地供应者数量查询(Q5)


    6.1.6 预测收入变化查询(Q6)


    6.1.7 货运量查询(Q7)


    6.1.8 国家市场份额查询(Q8)


    6.1.9 国家市场份额查询(Q9)


    6.1.10 返回项目报告查询(Q10)


    6.1.11 重要库存标志查询(Q11)


    6.1.12 货运模式和命令优先查询(Q12)


    6.1.13 消费者分配查询(Q13)


    6.1.14 促进效果查询(Q14)


    6.1.15 头等(Q15)


    6.1.16 零件/供应商关系查询(Q16)


    6.1.17 小量订单收入查询(Q17)


    6.1.18 大订单顾客查询(Q18)


    6.1.19 折扣收入查询(Q19)


    6.1.20 潜在零件促进查询(Q20)


    6.1.21 不能按时交货供应商查询(Q21)


    6.1.22 全球销售机会查询(Q22)

    6.2 名词解释

    执行计划运算类型

    操作说明

    是否有启动时间

    Seq Scan扫描表无启动时间
    Index Scan索引扫描无启动时间
    Bitmap Index Scan索引扫描有启动时间
    Bitmap Heap Scan索引扫描有启动时间
    Subquery Scan子查询无启动时间
    Tid Scanctid = …条件无启动时间
    Function Scan函数扫描无启动时间
    Nested Loop循环结合无启动时间
    Merge Join合并结合有启动时间
    Hash Join哈希结合有启动时间
    Sort排序,ORDER BY操作有启动时间
    Hash哈希运算有启动时间
    Result函数扫描,和具体的表无关无启动时间
    UniqueDISTINCT,UNION操作有启动时间
    LimitLIMIT,OFFSET操作有启动时间
    Aggregatecount, sum,avg, stddev集约函数有启动时间
    GroupGROUP BY分组操作有启动时间
    AppendUNION操作无启动时间
    Materialize子查询有启动时间
    SetOpINTERCECT,EXCEPT

    有启动时

     

    -> Hash (cost=1840.19..1840.19 rows=413438 width=16)
       -> HashAggregate (cost=0.00..1840.19 rows=413438 width=16)
          Group By: orders.o_custkey
          -> Redistribute Motion 48:48 (slice3; segments: 48) (cost=0.00..1786.12 rows=413438 width=16)
             Hash Key: orders.o_custkey
              -> Result (cost=0.00..1765.42 rows=413438 width=16)
                -> HashAggregate (cost=0.00..1765.42 rows=413438 width=16)
                  Group By: orders.o_custkey
                    -> Table Scan on orders (cost=0.00..881.31 rows=6250000 width=8)

     

    1.从下往上读
    2.explain报告查询的操作,开启的消耗,查询总的消耗,访问的行数 访问的平均宽度
    3.开启时间消耗是输出开始前的时间例如排序的时间
    4.消耗包括磁盘检索页,cpu时间 
    5.注意,每一步的cost包括上一步的
    6.重要的是,explain 不是真正的执行一次查询 只是得到查询执行的计划和估计的花费

    展开全文
  • TPCH是由TPC(Transaction Processing Performance Council)事务处理性能委员会公布的一套针对数据库决策支持能力的测试基准,通过模拟数据库中与业务相关的复杂查询考察数据库的综合处理能力,获取数据库操作的响应...
  • tpch数据库评测

    2015-05-10 15:42:50
    pch2.17.1.pdf 专业数据库评测
  • 此篇文档只介绍了如何在DM8上使用测试工具进行tpch测试的步骤,关于参数性能优化方面的内容不做介绍。 更多达梦数据库相关问题,请浏览云适配技术社区。 达梦云适配技术社区 https://eco.dameng.com/ 1.准备tpch...
  • tpch是TPC(Transaction Processing Performance Council)组织提供的工具包。用于进行OLAP测试,以评估商业分析中决策支持系统(DSS)的性能。它包含了一整套面向商业的ad-hoc查询和并发数据修改,强调测试的是数据库...
  • tpch-补丁 在 PostgreSQL 和 MySQL 上运行 TPC-H 基准测试的补丁 支持的版本 TPC-H 2.17.0 DBGen 2.17.0 用法 安装 $ PREFIX= $HOME ./install.sh install.sh执行: 在$PREFIX/src下载并解压 DBGen 应用补丁并...
  • tpch_2_14_3

    2020-12-19 18:26:15
    tpch_2_14_3 性能测试
  • 一、 查询要求 Q9语句是查询每个国家每一年所有被定购的零件在一年中的总利润。 Q9语句的特点是:带有分组、排序、聚集、子查询操作并存的查询操作。子查询的父层查询不存在其他查询对象,是格式相对简单的子...
  • 前几天听说 OceanBase 打榜了 TPCH 的第一名,之后又看到这样一个测试报告《OceanBase 开源版与 TiDB 对比测试报告》。 先点评一下这个测试报告: 1. 坦白地说,这个测试还不太专业,并不能比出分布式大数据计算的...
  • 下载postgreSQL的dbgen(官网tpc.org或者Github(tpch-dbgen)) 配置dbgen环境,我选择使用visual studio,参考这篇博客,需要配置(.net和C++桌面开发) 打开后生成项目,在生产的Debug文件下我dbgen.exe用shell...
  • tpch-tool.rar

    2020-01-03 20:55:37
    TPCH生成工具 TPCH生成工具 TPCH生成工具 TPCH生成工具 TPCH生成工具 TPCH生成工具
  • 因为TPCH v3.0原生只支持DB2, INFORMIX, ORACLE, SQLSERVER, SYBASE, TDAT,VECTORWISE 等OLAP的数据库,如果要使用TPCH测试ClickHouse,需要在原有的8张表结构上做语法修改,使其满足CK的语法要求; 便人便己,修改...
  • 一、 测试背景 由于众所周知的世界经贸形势,拥有全面自主可控的IT解决方案成为摆在国家面前的一项重要任务。但是,由于目前国产芯片的性能还相对较弱,大多只能应用于OA等边缘业务,要用到关键业务中时就必须...
  • 一、 查询要求 Q7语句是查询从供货商国家与销售商品的国家之间通过销售获利情况的查询。此查询确定在两国之间货运商品的量用以帮助重新谈判货运合同。 Q7语句的特点是:带有分组、排序、聚集、子查询操作并存的...
  • 这是一个常规的分组查询,结果集也不大,没有特殊的优化技术,使用多路游标充分利用并行即可。 编写Q1查询的SPL脚本如下:   A 1 =1 2 1995-12-01 3 =A2-90 4 =now...
  • 欢迎来到您的新dbt项目! 使用入门项目 ...dbt测试 资源: 了解有关dbt的更多信息 查看了解常见问题和解答 加入Slack上的,进行实时讨论和支持 查找您附近的 查看获取有关dbt开发和最佳实践的最新新闻。
  • 我们注意到,条件表达式中有不少项只与较小的表part相关,而且这些项的计算复杂度还较高(有in运算),如果先针对part表把这些项计算出来,则计算量将比针对两表连接结果集再计算要少得多,因为连续结果集是和line...
  • 一、 查询要求 Q2语句查询获得最小代价的供货商。得到给定的区域内,对于指定的...查询语句没有从语法上限制返回多少条元组,TPC-H标准规定,查询结果只返回前100行即可(通常依赖于应用程序实现)。 二、 O...
  •  用4的结果当作外键表去与lineitem表连接,并将PS_AVAILQTY字段选出,然后按L_PARTKEY、L_SUPPKEY分组聚合,前面说过,用等值条件与主表关联的子查询都可以改写成分组后再与主表连接的语句。分组聚合后选出满足条件...
  • 如果服务器内存足够,可以把过滤后的结果保存在内存中,无需第二次遍历读数,性能将会提高不少,因为lineitem表数据量很大。 SPL脚本如下: A 1 =1 2 =now() 3 >brand="Brand#33" 4 >container="LG DRUM" 5 =file...
  • 一、 查询要求 Q15语句查询获得某段时间内为总收入贡献最多的供货商(排名第一)的信息。可用以决定对哪些头等供货商给予奖励、给予更多订单、给予特别认证、给予鼓舞等激励。 Q15语句的特点是:带有分组、排序...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 865
精华内容 346
关键字:

tpch测试结果