精华内容
下载资源
问答
  • ESET NOD32杀毒防护软件内部搭建免费更新服务器,分享给从事IT管理的网友们。支持 V4 V5 V6 V7 V8 V9各种版本。免费病毒库更新服务器地址。
  • TL-WR740N_V4V5V6V7

    2015-01-30 20:01:14
    安装驱动 我终于成功的解决了TP-Link WR740N v3/v4版本的TTL刷机问题。经过这次实践,让我彻底明白了,网上的评论有可能是错的,因为杂七杂八的堆在一起,很多方法都可能不是运用在这里,让很多人都会走弯路。因此我...
  • 2.11GX30011121581227+4黑V4V5V6gx很好用的
  • 购买了昂达机器的朋友可能都会发现刷固件的时候固件分v1 v2 v3 v4 v5版本,那么可能有朋友就有疑问了,这几个版本到底有什么区别呢?今天昂达管理员就统一做了个解答,大家可以看看。目前为止,基本上昂达的平板电脑...

    购买了昂达机器的朋友可能都会发现刷固件的时候固件分v1 v2 v3 v4 v5版本,那么可能有朋友就有疑问了,这几个版本到底有什么区别呢?今天昂达管理员就统一做了个解答,大家可以看看。

    目前为止,基本上昂达的平板电脑(不仅仅VI40)都有版本的区别。

    版本之间的差别主要在以下几个方面

    1、出厂批次不同版本不同,一般V1是最早出厂批次机器,然后是V2,V3。。。!

    2、做工模具方面稍微变化,例如从最早的智能机器VX580W来说,11年5月上市外观为带有斜十字花纹的外壳,11年9月份之后统一改为纯黑色外壳,12年2月份以来,机器外包装统一变化为白色包装,更加适合网购物流运输。但是其他硬件配置没有变化。

    3、机器硬件配置的改变,最明显的可以参考目前还在热销的VI30豪华版,V1-V2-V3配置基本都是512M的RAM,130W像素摄像头,出厂自带安卓2.3系统,后期为了更加流畅的运行安卓4.0系统,V4-V5之后,开始降低了摄像头像素为30w像素,ROM统一为1G,其中V5版本的RAM也改成了1G。目前V5版本是双1G内存的配置。但是V5版本价格相比之前V1版本的降了200元。

    4、其他做工/系统方面的优化和升级。随着平板上市时间的增加,昂达工程师也会不断搜集用户反馈的意见和建议,不断的在做工工艺和系统优化方面进行改进,尽可能的让平板做工更加完美,系统更加稳定。

    5、不同版本之间因为硬件配置的差异相互之间固件不兼容,从而也导致了部分机器固件更新不同步。就拿vi40精英版来讲,V1版本的固件升级到了4.03的1.0版本,V3固件是4.03的1.1版本。是不是版本之间差了0.1就意味着1.1的版本比1.0版本更加牛逼呢。这个纯属扯淡。从另外一个角度一句话概况,不同版本之间因为配置的稍微不同他们之间固件对比也就没有任何意义。

    6、关于固件升级这个老生常谈的话题,如果是真正的用户,请耐心等待昂达工程师的开发进度,有最新的固件工程师一定不会藏着掖着,迟早会发布。

    展开全文
  • CCS v4, v5 and v6 许可证。亲测CCS6.0.0可用。 将解压出的许可证放在安装目录\ti\license下。没有license文件夹,可新建一个license文件夹。将许可证放在license文件夹内即可。 软件右下角会变成 Full License。
  • 信用卡欺诈检测

    2021-08-18 16:18:04
    ----- 0 Time 284807 non-null float64 1 V1 284807 non-null float64 2 V2 284807 non-null float64 3 V3 284807 non-null float64 4 V4 284807 non-null float64 5 V5 284807 non-null float64 6 V6 284807 non-...

    信用卡欺诈检测

    信用卡欺诈检测是kaggle上一个项目,数据来源是2013年欧洲持有信用卡的交易数据,详细内容见https://www.kaggle.com/mlg-ulb/creditcardfraud
    这个项目所要实现的目标是对一个交易预测它是否存在信用卡欺诈,和大部分机器学习项目的区别在于正负样本的不均衡,而且是极不均衡的,所以这是特征工程需要处理的第一个问题。
    除此之外,在数据预处理上减轻的负担是缺失值的处理,并且大多数特征是经过了均值化处理的。

    项目背景与数据初探

    # 导入基础的库,其他的模型库在需要时再导入
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    import warnings
    warnings.filterwarnings('ignore')
    
    # 设置显示的宽度,避免有过长行不显示的问题
    pd.set_option('display.max_columns', 10000)
    pd.set_option('display.max_colwidth', 10000)
    pd.set_option('display.width', 10000)
    
    # 导入数据并查看基本数据情况
    data = pd.read_csv('D:/数据分析/kaggle/信用卡欺诈/creditcard.csv')
    data.info()
    
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 284807 entries, 0 to 284806
    Data columns (total 31 columns):
     #   Column  Non-Null Count   Dtype  
    ---  ------  --------------   -----  
     0   Time    284807 non-null  float64
     1   V1      284807 non-null  float64
     2   V2      284807 non-null  float64
     3   V3      284807 non-null  float64
     4   V4      284807 non-null  float64
     5   V5      284807 non-null  float64
     6   V6      284807 non-null  float64
     7   V7      284807 non-null  float64
     8   V8      284807 non-null  float64
     9   V9      284807 non-null  float64
     10  V10     284807 non-null  float64
     11  V11     284807 non-null  float64
     12  V12     284807 non-null  float64
     13  V13     284807 non-null  float64
     14  V14     284807 non-null  float64
     15  V15     284807 non-null  float64
     16  V16     284807 non-null  float64
     17  V17     284807 non-null  float64
     18  V18     284807 non-null  float64
     19  V19     284807 non-null  float64
     20  V20     284807 non-null  float64
     21  V21     284807 non-null  float64
     22  V22     284807 non-null  float64
     23  V23     284807 non-null  float64
     24  V24     284807 non-null  float64
     25  V25     284807 non-null  float64
     26  V26     284807 non-null  float64
     27  V27     284807 non-null  float64
     28  V28     284807 non-null  float64
     29  Amount  284807 non-null  float64
     30  Class   284807 non-null  int64  
    dtypes: float64(30), int64(1)
    memory usage: 67.4 MB
    
    data.shape
    
    (284807, 31)
    
    data.describe()
    
    TimeV1V2V3V4V5V6V7V8V9V10V11V12V13V14V15V16V17V18V19V20V21V22V23V24V25V26V27V28AmountClass
    count284807.0000002.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+052.848070e+05284807.000000284807.000000
    mean94813.8595753.919560e-155.688174e-16-8.769071e-152.782312e-15-1.552563e-152.010663e-15-1.694249e-15-1.927028e-16-3.137024e-151.768627e-159.170318e-16-1.810658e-151.693438e-151.479045e-153.482336e-151.392007e-15-7.528491e-164.328772e-169.049732e-165.085503e-161.537294e-167.959909e-165.367590e-164.458112e-151.453003e-151.699104e-15-3.660161e-16-1.206049e-1688.3496190.001727
    std47488.1459551.958696e+001.651309e+001.516255e+001.415869e+001.380247e+001.332271e+001.237094e+001.194353e+001.098632e+001.088850e+001.020713e+009.992014e-019.952742e-019.585956e-019.153160e-018.762529e-018.493371e-018.381762e-018.140405e-017.709250e-017.345240e-017.257016e-016.244603e-016.056471e-015.212781e-014.822270e-014.036325e-013.300833e-01250.1201090.041527
    min0.000000-5.640751e+01-7.271573e+01-4.832559e+01-5.683171e+00-1.137433e+02-2.616051e+01-4.355724e+01-7.321672e+01-1.343407e+01-2.458826e+01-4.797473e+00-1.868371e+01-5.791881e+00-1.921433e+01-4.498945e+00-1.412985e+01-2.516280e+01-9.498746e+00-7.213527e+00-5.449772e+01-3.483038e+01-1.093314e+01-4.480774e+01-2.836627e+00-1.029540e+01-2.604551e+00-2.256568e+01-1.543008e+010.0000000.000000
    25%54201.500000-9.203734e-01-5.985499e-01-8.903648e-01-8.486401e-01-6.915971e-01-7.682956e-01-5.540759e-01-2.086297e-01-6.430976e-01-5.354257e-01-7.624942e-01-4.055715e-01-6.485393e-01-4.255740e-01-5.828843e-01-4.680368e-01-4.837483e-01-4.988498e-01-4.562989e-01-2.117214e-01-2.283949e-01-5.423504e-01-1.618463e-01-3.545861e-01-3.171451e-01-3.269839e-01-7.083953e-02-5.295979e-025.6000000.000000
    50%84692.0000001.810880e-026.548556e-021.798463e-01-1.984653e-02-5.433583e-02-2.741871e-014.010308e-022.235804e-02-5.142873e-02-9.291738e-02-3.275735e-021.400326e-01-1.356806e-025.060132e-024.807155e-026.641332e-02-6.567575e-02-3.636312e-033.734823e-03-6.248109e-02-2.945017e-026.781943e-03-1.119293e-024.097606e-021.659350e-02-5.213911e-021.342146e-031.124383e-0222.0000000.000000
    75%139320.5000001.315642e+008.037239e-011.027196e+007.433413e-016.119264e-013.985649e-015.704361e-013.273459e-015.971390e-014.539234e-017.395934e-016.182380e-016.625050e-014.931498e-016.488208e-015.232963e-013.996750e-015.008067e-014.589494e-011.330408e-011.863772e-015.285536e-011.476421e-014.395266e-013.507156e-012.409522e-019.104512e-027.827995e-0277.1650000.000000
    max172792.0000002.454930e+002.205773e+019.382558e+001.687534e+013.480167e+017.330163e+011.205895e+022.000721e+011.559499e+012.374514e+011.201891e+017.848392e+007.126883e+001.052677e+018.877742e+001.731511e+019.253526e+005.041069e+005.591971e+003.942090e+012.720284e+011.050309e+012.252841e+014.584549e+007.519589e+003.517346e+003.161220e+013.384781e+0125691.1600001.000000
    data.head().append(data.tail())
    
    TimeV1V2V3V4V5V6V7V8V9V10V11V12V13V14V15V16V17V18V19V20V21V22V23V24V25V26V27V28AmountClass
    00.0-1.359807-0.0727812.5363471.378155-0.3383210.4623880.2395990.0986980.3637870.090794-0.551600-0.617801-0.991390-0.3111691.468177-0.4704010.2079710.0257910.4039930.251412-0.0183070.277838-0.1104740.0669280.128539-0.1891150.133558-0.021053149.620
    10.01.1918570.2661510.1664800.4481540.060018-0.082361-0.0788030.085102-0.255425-0.1669741.6127271.0652350.489095-0.1437720.6355580.463917-0.114805-0.183361-0.145783-0.069083-0.225775-0.6386720.101288-0.3398460.1671700.125895-0.0089830.0147242.690
    21.0-1.358354-1.3401631.7732090.379780-0.5031981.8004990.7914610.247676-1.5146540.2076430.6245010.0660840.717293-0.1659462.345865-2.8900831.109969-0.121359-2.2618570.5249800.2479980.7716790.909412-0.689281-0.327642-0.139097-0.055353-0.059752378.660
    31.0-0.966272-0.1852261.792993-0.863291-0.0103091.2472030.2376090.377436-1.387024-0.054952-0.2264870.1782280.507757-0.287924-0.631418-1.059647-0.6840931.965775-1.232622-0.208038-0.1083000.005274-0.190321-1.1755750.647376-0.2219290.0627230.061458123.500
    42.0-1.1582330.8777371.5487180.403034-0.4071930.0959210.592941-0.2705330.8177390.753074-0.8228430.5381961.345852-1.1196700.175121-0.451449-0.237033-0.0381950.8034870.408542-0.0094310.798278-0.1374580.141267-0.2060100.5022920.2194220.21515369.990
    284802172786.0-11.88111810.071785-9.834783-2.066656-5.364473-2.606837-4.9182157.3053341.9144284.356170-1.5931052.711941-0.6892564.626942-0.9244591.1076411.9916910.510632-0.6829201.4758290.2134540.1118641.014480-0.5093481.4368070.2500340.9436510.8237310.770
    284803172787.0-0.732789-0.0550802.035030-0.7385890.8682291.0584150.0243300.2948690.584800-0.975926-0.1501890.9158021.214756-0.6751431.164931-0.711757-0.025693-1.221179-1.5455560.0596160.2142050.9243840.012463-1.016226-0.606624-0.3952550.068472-0.05352724.790
    284804172788.01.919565-0.301254-3.249640-0.5578282.6305153.031260-0.2968270.7084170.432454-0.4847820.4116140.063119-0.183699-0.5106021.3292840.1407160.3135020.395652-0.5772520.0013960.2320450.578229-0.0375010.6401340.265745-0.0873710.004455-0.02656167.880
    284805172788.0-0.2404400.5304830.7025100.689799-0.3779610.623708-0.6861800.6791450.392087-0.399126-1.933849-0.962886-1.0420820.4496241.962563-0.6085770.5099281.1139812.8978490.1274340.2652450.800049-0.1632980.123205-0.5691590.5466680.1088210.10453310.000
    284806172792.0-0.533413-0.1897330.703337-0.506271-0.012546-0.6496171.577006-0.4146500.486180-0.915427-1.040458-0.031513-0.188093-0.0843160.041333-0.302620-0.6603770.167430-0.2561170.3829480.2610570.6430780.3767770.008797-0.473649-0.818267-0.0024150.013649217.000
    data.Class.value_counts()
    
    0    284315
    1       492
    Name: Class, dtype: int64
    
    • 数据包含284807样本,30个属性特征和一个所属类别,数据完整没有缺失值,所以不需要缺失值的处理。非匿名特征包括时间和交易金额,以及所属的类别,匿名特征28个从v1-v28,统计上的均值(基本上等于0)和方差(1左右)可以看出是已经进行了归一化处理。(在Kaggle上的介绍说:匿名特征是经过了脱敏和PCA处理的,时间特征Time包含数据集中每个交易和第一个交易之间经过的秒数,应该是距离开始采集数据的时间,总共是两天,172792正好是差不多48小时)
    • 正负样本不均衡的问题,从描述性统计结果上的class列,均值为0.001727,说明极大多数样本都是0,也可以通过data.value_counts()查看,正常交易284315项,而异常的只有492项,这种极不平衡的样本处理将是特征工程的主要任务

    探索性数据分析(EDA)

    * 单一属性分析

    数据属性都是数值型,所以不需要区分数值属性和类别属性,也不需要对类别属性的重新编码,下面分析单一属性的特点,从预测类别开始

    通过类别的分布图以及峰度、偏度的计算,可以更直观的看到样本分布的不均衡

    # 欺诈与非欺诈类别分布的直方图
    sns.countplot('Class', data=data, color='blue')
    plt.xlabel('values')
    plt.ylabel('Counts')
    plt.title('Class Distributions \n (0: No Fraud || 1: Fraud)')
    
    Text(0.5, 1.0, 'Class Distributions \n (0: No Fraud || 1: Fraud)')
    

    在这里插入图片描述

    print('Kurtosis:', data.Class.kurt())
    print('Skewness:', data.Class.skew())
    
    Kurtosis: 573.887842782971
    Skewness: 23.99757931064749
    

    下面分析两个没有经过标准化的属性:Time和Amount

    # 数值属性时间与金额分布图,金额用正态分布和对数分布两种方式拟合
    import scipy.stats as st
    fig, ax = plt.subplots(1, 3, figsize=(18, 4))
    print(ax)
    sns.distplot(data.Amount, color='blue', ax=ax[0],kde = False,fit=st.norm)
    ax[0].set_title('Distribution of transaction amount_normal')
    
    sns.distplot(data.Amount,color='blue',ax=ax[1],fit=st.lognorm)
    ax[1].set_title('Distribution of transaction amount_lognorm')
    
    sns.distplot(data.Time, color='r', ax=ax[2])
    ax[2].set_title('Distribution of transaction time')
    

    在这里插入图片描述

    print(data.Amount.value_counts())
    
    1.00       13688
    1.98        6044
    0.89        4872
    9.99        4747
    15.00       3280
               ...  
    192.63         1
    218.84         1
    195.52         1
    793.50         1
    1080.06        1
    Name: Amount, Length: 32767, dtype: int64
    
    print('the ratio of Amount<5:', data.Amount[data.Amount < 5].value_counts(
    ).sum()/data.Amount.value_counts().sum())
    print('the ratio of Amount<10:', data.Amount[data.Amount < 10].value_counts(
    ).sum()/data.Amount.value_counts().sum())
    print('the ratio of Amount<20:', data.Amount[data.Amount < 20].value_counts(
    ).sum()/data.Amount.value_counts().sum())
    print('the ratio of Amount<30:', data.Amount[data.Amount < 30].value_counts(
    ).sum()/data.Amount.value_counts().sum())
    print('the ratio of Amount<50:', data.Amount[data.Amount < 50].value_counts(
    ).sum()/data.Amount.value_counts().sum())
    print('the ratio of Amount<100:', data.Amount[data.Amount < 100].value_counts(
    ).sum()/data.Amount.value_counts().sum())
    print('the ratio of Amount>5000:', data.Amount[data.Amount > 5000].value_counts(
    ).sum()/data.Amount.value_counts().sum())
    
    the ratio of Amount<5: 0.2368726892246328
    the ratio of Amount<10: 0.3416840175978821
    the ratio of Amount<20: 0.481476929991187
    the ratio of Amount<30: 0.562022703093674
    the ratio of Amount<50: 0.6660791342909409
    the ratio of Amount<100: 0.7985126770058321
    the ratio of Amount>5000: 0.00019311323106524768
    

    在金额属性上,绝大多数金额都是小于50美元较小的,存在少量的较大数值如1080美元;时间上在15000s和100000s附近出现了两次低峰,距离开始采样数据的时间分别是4小时和27小时,猜测这时候是凌晨3、4点,这也符合现实,毕竟深夜购物的人还是少数。由于其他的匿名属性都进行了均值化处理,并且金额的拟合效果正态分布优于取对数,下面也将金额Amount进行标准化处理,同样的,时间转化为小时之后再作均值化处理。

    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    data['Amount'] = sc.fit_transform(data.Amount.values.reshape(-1, 1))
    # reshape()函数的两个参数,-1表示不知道多少行,1表示一列
    data['Hour'] = data.Time.apply(lambda x: divmod(x, 3600)[0])
    data['Hour'] = data.Hour.apply(lambda x: divmod(x, 24)[1])
    # 时间进一步转换成24小时制,因为考虑到交易密度的周期性分布
    data['Hour'] = sc.fit_transform(data['Hour'].values.reshape(-1, 1))
    data.drop(columns='Time', inplace=True)
    data.head().append(data.tail())
    
    V1V2V3V4V5V6V7V8V9V10V11V12V13V14V15V16V17V18V19V20V21V22V23V24V25V26V27V28AmountClassHour
    0-1.359807-0.0727812.5363471.378155-0.3383210.4623880.2395990.0986980.3637870.090794-0.551600-0.617801-0.991390-0.3111691.468177-0.4704010.2079710.0257910.4039930.251412-0.0183070.277838-0.1104740.0669280.128539-0.1891150.133558-0.0210530.2449640-2.40693
    11.1918570.2661510.1664800.4481540.060018-0.082361-0.0788030.085102-0.255425-0.1669741.6127271.0652350.489095-0.1437720.6355580.463917-0.114805-0.183361-0.145783-0.069083-0.225775-0.6386720.101288-0.3398460.1671700.125895-0.0089830.014724-0.3424750-2.40693
    2-1.358354-1.3401631.7732090.379780-0.5031981.8004990.7914610.247676-1.5146540.2076430.6245010.0660840.717293-0.1659462.345865-2.8900831.109969-0.121359-2.2618570.5249800.2479980.7716790.909412-0.689281-0.327642-0.139097-0.055353-0.0597521.1606860-2.40693
    3-0.966272-0.1852261.792993-0.863291-0.0103091.2472030.2376090.377436-1.387024-0.054952-0.2264870.1782280.507757-0.287924-0.631418-1.059647-0.6840931.965775-1.232622-0.208038-0.1083000.005274-0.190321-1.1755750.647376-0.2219290.0627230.0614580.1405340-2.40693
    4-1.1582330.8777371.5487180.403034-0.4071930.0959210.592941-0.2705330.8177390.753074-0.8228430.5381961.345852-1.1196700.175121-0.451449-0.237033-0.0381950.8034870.408542-0.0094310.798278-0.1374580.141267-0.2060100.5022920.2194220.215153-0.0734030-2.40693
    284802-11.88111810.071785-9.834783-2.066656-5.364473-2.606837-4.9182157.3053341.9144284.356170-1.5931052.711941-0.6892564.626942-0.9244591.1076411.9916910.510632-0.6829201.4758290.2134540.1118641.014480-0.5093481.4368070.2500340.9436510.823731-0.35015101.53423
    284803-0.732789-0.0550802.035030-0.7385890.8682291.0584150.0243300.2948690.584800-0.975926-0.1501890.9158021.214756-0.6751431.164931-0.711757-0.025693-1.221179-1.5455560.0596160.2142050.9243840.012463-1.016226-0.606624-0.3952550.068472-0.053527-0.25411701.53423
    2848041.919565-0.301254-3.249640-0.5578282.6305153.031260-0.2968270.7084170.432454-0.4847820.4116140.063119-0.183699-0.5106021.3292840.1407160.3135020.395652-0.5772520.0013960.2320450.578229-0.0375010.6401340.265745-0.0873710.004455-0.026561-0.08183901.53423
    284805-0.2404400.5304830.7025100.689799-0.3779610.623708-0.6861800.6791450.392087-0.399126-1.933849-0.962886-1.0420820.4496241.962563-0.6085770.5099281.1139812.8978490.1274340.2652450.800049-0.1632980.123205-0.5691590.5466680.1088210.104533-0.31324901.53423
    284806-0.533413-0.1897330.703337-0.506271-0.012546-0.6496171.577006-0.4146500.486180-0.915427-1.040458-0.031513-0.188093-0.0843160.041333-0.302620-0.6603770.167430-0.2561170.3829480.2610570.6430780.3767770.008797-0.473649-0.818267-0.0024150.0136490.51435501.53423
    # 将样本顺序打乱,因为Hour属性是有顺序的,为后期样本训练集测试集的划分做准备
    data = data.sample(frac=1)
    data.head().append(data.tail())
    
    V1V2V3V4V5V6V7V8V9V10V11V12V13V14V15V16V17V18V19V20V21V22V23V24V25V26V27V28AmountClassHour
    1072161.324299-1.0855630.237279-1.479829-1.161581-0.372790-0.742945-0.049038-2.3765631.5162491.6315160.1370250.6279110.0493870.025037-0.5329860.575155-0.556925-0.065486-0.214061-0.526897-1.3424340.251389-0.065977-0.029430-0.6363250.0206830.022199-0.05513200.848811
    168047-0.2168300.175845-0.1269170.1161603.3483834.2926720.2967030.6221670.3899990.133185-0.6514320.098311-0.612388-0.477757-1.049531-1.5588860.199438-0.7729571.2040000.094183-0.318508-0.357454-0.1127930.685842-0.469426-0.769896-0.187049-0.232898-0.3452330-0.864737
    32891.2502490.019063-1.326108-0.0390592.2323413.300602-0.3264350.757703-0.1563520.062703-0.1616490.029727-0.0184570.5309511.1075120.377201-0.9263630.295044-0.0067830.033750-0.009900-0.189322-0.1577341.0053260.838403-0.3155820.0114390.018031-0.2332870-2.406930
    279921-4.8147084.736862-4.817819-1.103622-2.256585-2.425710-1.6570503.493293-0.2078190.013773-2.2603130.744399-0.9738003.250789-0.2825990.3134081.121537-0.022531-0.416665-0.1179940.3753610.5546910.3497640.0261270.2755420.1481780.1023200.185414-0.31996501.362876
    1782972.100716-0.778343-0.596761-0.557506-0.5752070.293849-0.9582460.146132-0.1369210.7989090.4286480.9584040.715275-0.043584-0.140031-1.084006-0.6497641.591873-0.423356-0.580698-0.562834-1.0444070.4598200.203965-0.508869-0.6864750.053956-0.034934-0.3531890-0.693382
    2249442.0533110.089735-1.6818360.4542120.298310-0.9535260.152003-0.2070710.587335-0.362047-0.589598-0.174712-0.621127-0.7035130.2719570.3186880.549365-0.2577860.016256-0.187421-0.361158-0.9842620.3541980.620709-0.2971380.166736-0.068299-0.029585-0.31728700.334747
    1447941.9633770.175655-1.7915051.1803710.493289-1.1572600.678691-0.322212-0.2731200.5533500.9006650.380835-1.1977761.219675-0.368170-0.251083-0.5450040.080937-0.199778-0.2961840.1884540.525823-0.019335-0.0077000.374702-0.503352-0.043733-0.070953-0.2338870-2.406930
    565611.1802640.668819-0.2423821.2847480.029324-1.0395430.202761-0.102430-0.340626-0.5083671.9780500.556068-0.337921-1.0959690.3914290.7408690.7266371.040940-0.411526-0.103660-0.0044790.014954-0.1089920.3978770.628531-0.3569310.0306630.049046-0.3492310-0.179318
    37115-2.0910271.2490320.841086-0.777488-0.176500-0.077257-0.118603-0.2567510.178740-0.0003050.9918560.698911-0.9019700.341906-0.643972-0.011763-0.069715-0.449297-0.255400-0.5172880.631502-0.4132650.293367-0.000012-0.3186880.224045-0.725597-0.392266-0.3492310-0.693382
    589511.1624470.2724580.6151651.058086-0.262004-0.359390-0.012728-0.0156200.066470-0.087054-0.0614590.4703500.2153490.2932571.308914-0.006260-0.233542-0.816695-0.644417-0.141656-0.243925-0.6937250.1683490.0312700.216868-0.6651920.0458230.031301-0.3052520-0.179318
    # 下面查看匿名属性的分布
    # 匿名属性的峰度与偏度
    numerical_columns = data.columns.drop(['Class','Hour','Amount'])
    for num_col in numerical_columns:
        print('{:10}'.format(num_col), 'Skewness:', '{:8.2f}'.format(
            data[num_col].skew()), '         Kurtosis:', '{:8.2f}'.format(data[num_col].kurt()))
    
    V1         Skewness:    -3.28          Kurtosis:    32.49
    V2         Skewness:    -4.62          Kurtosis:    95.77
    V3         Skewness:    -2.24          Kurtosis:    26.62
    V4         Skewness:     0.68          Kurtosis:     2.64
    V5         Skewness:    -2.43          Kurtosis:   206.90
    V6         Skewness:     1.83          Kurtosis:    42.64
    V7         Skewness:     2.55          Kurtosis:   405.61
    V8         Skewness:    -8.52          Kurtosis:   220.59
    V9         Skewness:     0.55          Kurtosis:     3.73
    V10        Skewness:     1.19          Kurtosis:    31.99
    V11        Skewness:     0.36          Kurtosis:     1.63
    V12        Skewness:    -2.28          Kurtosis:    20.24
    V13        Skewness:     0.07          Kurtosis:     0.20
    V14        Skewness:    -2.00          Kurtosis:    23.88
    V15        Skewness:    -0.31          Kurtosis:     0.28
    V16        Skewness:    -1.10          Kurtosis:    10.42
    V17        Skewness:    -3.84          Kurtosis:    94.80
    V18        Skewness:    -0.26          Kurtosis:     2.58
    V19        Skewness:     0.11          Kurtosis:     1.72
    V20        Skewness:    -2.04          Kurtosis:   271.02
    V21        Skewness:     3.59          Kurtosis:   207.29
    V22        Skewness:    -0.21          Kurtosis:     2.83
    V23        Skewness:    -5.88          Kurtosis:   440.09
    V24        Skewness:    -0.55          Kurtosis:     0.62
    V25        Skewness:    -0.42          Kurtosis:     4.29
    V26        Skewness:     0.58          Kurtosis:     0.92
    V27        Skewness:    -1.17          Kurtosis:   244.99
    V28        Skewness:    11.19          Kurtosis:   933.40
    
    f = pd.melt(data, value_vars=numerical_columns)
    g = sns.FacetGrid(f, col='variable', col_wrap=4, sharex=False, sharey=False)
    g = g.map(sns.distplot, 'value')
    

    在这里插入图片描述

    因为数据经过了标准化处理,所以匿名特征的偏度相对较小,峰度较大,分布上v_5, v_7, v_8, v_20, v_21, v_23, v_27,v_28的峰度值较高,说明这些特征的分布十分集中,其他的特征分布相对均匀。

    * 分析两个属性之间的关系
    # 相关性分析
    plt.figure(figsize=(8, 6))
    sns.heatmap(data.corr(), square=True,
                cmap='coolwarm_r', annot_kws={'size': 20})
    plt.show()
    data.corr()
    

    在这里插入图片描述

    V1V2V3V4V5V6V7V8V9V10V11V12V13V14V15V16V17V18V19V20V21V22V23V24V25V26V27V28AmountClassHour
    V11.000000e+001.717515e-16-9.049057e-16-2.483769e-163.029761e-161.242968e-164.904952e-17-2.809306e-175.255584e-175.521194e-172.590354e-161.898607e-16-3.778206e-174.132544e-16-1.522953e-163.007488e-16-3.106689e-171.645281e-161.045442e-169.852066e-17-1.808291e-167.990046e-171.056762e-16-5.113541e-17-2.165870e-16-1.408070e-161.664818e-162.419798e-16-0.227709-0.101347-0.005214
    V21.717515e-161.000000e+009.206734e-17-1.226183e-161.487342e-163.492970e-16-4.501183e-17-5.839820e-17-1.832049e-16-2.598347e-163.312314e-16-3.228806e-16-1.091369e-16-4.657267e-165.392819e-17-3.749559e-18-5.591430e-162.877862e-16-1.672445e-173.898167e-174.667231e-171.203109e-163.242403e-16-1.254911e-168.784216e-172.450901e-16-5.467509e-16-6.910935e-17-0.5314090.0912890.007802
    V3-9.049057e-169.206734e-171.000000e+00-2.981464e-16-6.943096e-161.308147e-152.120327e-16-8.586741e-179.780262e-172.764966e-161.500352e-162.182812e-16-4.679364e-176.942636e-16-5.333200e-175.460118e-162.134100e-162.870116e-163.728807e-161.267443e-161.189711e-16-2.343257e-16-8.182206e-17-3.300147e-171.123060e-16-2.136494e-164.752587e-166.073110e-16-0.210880-0.192961-0.021569
    V4-2.483769e-16-1.226183e-16-2.981464e-161.000000e+00-1.903391e-15-4.169652e-16-6.535390e-175.942856e-166.175719e-16-6.910284e-17-2.936726e-16-1.448546e-163.050372e-17-8.547776e-172.459280e-16-8.218577e-17-4.443050e-165.369916e-18-2.842719e-16-2.222520e-161.390687e-172.189964e-161.663593e-161.403733e-166.312530e-16-4.009636e-16-6.309346e-17-2.064064e-160.0987320.133447-0.035063
    V53.029761e-161.487342e-16-6.943096e-16-1.903391e-151.000000e+001.159613e-157.659742e-177.328495e-164.435269e-161.632311e-166.784587e-164.520778e-16-2.979964e-162.516209e-161.016075e-166.264287e-164.535815e-164.196874e-16-1.277261e-16-2.414281e-169.325965e-17-6.982655e-17-1.848644e-16-9.892370e-16-1.561416e-163.403172e-163.299056e-16-3.491468e-16-0.386356-0.094974-0.035134
    V61.242968e-163.492970e-161.308147e-15-4.169652e-161.159613e-151.000000e+00-2.949670e-16-3.474079e-16-1.008735e-161.322142e-168.380230e-162.570184e-16-1.251524e-163.531769e-16-6.825844e-17-1.823748e-161.161080e-166.313161e-176.136340e-17-1.318056e-16-4.925144e-17-9.729827e-17-3.176032e-17-1.125379e-155.563670e-16-2.627057e-16-4.040640e-164.612882e-170.215981-0.043643-0.018945
    V74.904952e-17-4.501183e-172.120327e-16-6.535390e-177.659742e-17-2.949670e-161.000000e+003.038794e-17-5.250969e-173.186953e-16-3.362622e-167.265464e-16-1.485108e-167.720708e-17-1.845909e-164.901078e-167.173458e-161.638629e-16-1.132423e-161.889527e-16-7.597231e-17-6.887963e-161.393022e-162.078026e-17-1.507689e-17-7.709408e-16-2.647380e-182.115388e-170.397311-0.187257-0.009729
    V8-2.809306e-17-5.839820e-17-8.586741e-175.942856e-167.328495e-16-3.474079e-163.038794e-171.000000e+004.683000e-16-3.022868e-161.499830e-163.887009e-17-3.213252e-16-2.288651e-161.109628e-162.500367e-16-3.808536e-16-3.119192e-16-3.559019e-161.098800e-17-2.338214e-16-6.701600e-182.701514e-16-2.444390e-16-1.792313e-161.092765e-173.921512e-16-5.158971e-16-0.1030790.0198750.032106
    V95.255584e-17-1.832049e-169.780262e-176.175719e-164.435269e-16-1.008735e-16-5.250969e-174.683000e-161.000000e+00-4.733827e-163.289603e-16-1.339732e-159.374886e-169.287436e-16-8.883532e-16-5.409347e-167.071023e-161.471108e-161.293082e-16-3.112119e-162.755460e-16-2.171404e-16-1.011218e-16-2.940457e-162.137255e-16-1.039639e-16-1.499396e-167.982292e-16-0.044246-0.097733-0.189830
    V105.521194e-17-2.598347e-162.764966e-16-6.910284e-171.632311e-161.322142e-163.186953e-16-3.022868e-16-4.733827e-161.000000e+00-3.633385e-168.563304e-16-4.013607e-166.638602e-163.932439e-161.882434e-166.617837e-164.829483e-164.623218e-17-1.340974e-151.048675e-15-2.890990e-161.907376e-16-7.312196e-17-3.457860e-16-4.117783e-16-3.115507e-163.949646e-16-0.101502-0.2168830.024177
    V112.590354e-163.312314e-161.500352e-16-2.936726e-166.784587e-168.380230e-16-3.362622e-161.499830e-163.289603e-16-3.633385e-161.000000e+00-7.116039e-164.369928e-16-1.283496e-161.903820e-161.158881e-166.624541e-169.910529e-17-1.093636e-15-1.478641e-166.632474e-181.312323e-171.404725e-161.672342e-15-6.082687e-16-1.240097e-16-1.519253e-16-2.909057e-160.0001040.154876-0.135131
    V121.898607e-16-3.228806e-162.182812e-16-1.448546e-164.520778e-162.570184e-167.265464e-163.887009e-17-1.339732e-158.563304e-16-7.116039e-161.000000e+00-2.297323e-144.486162e-16-3.033543e-164.714076e-16-3.797286e-16-6.830564e-161.782434e-162.673446e-165.724276e-16-3.587155e-173.029886e-164.510178e-166.970336e-181.653468e-16-2.721798e-167.065902e-16-0.009542-0.2605930.352459
    V13-3.778206e-17-1.091369e-16-4.679364e-173.050372e-17-2.979964e-16-1.251524e-16-1.485108e-16-3.213252e-169.374886e-16-4.013607e-164.369928e-16-2.297323e-141.000000e+001.415589e-15-1.185819e-164.849394e-168.705885e-172.432753e-16-6.331767e-17-3.200986e-171.428638e-16-4.602453e-17-7.174408e-16-6.376621e-16-1.142909e-16-1.478991e-16-5.300185e-161.043260e-150.005293-0.004570-0.187981
    V144.132544e-16-4.657267e-166.942636e-16-8.547776e-172.516209e-163.531769e-167.720708e-17-2.288651e-169.287436e-166.638602e-16-1.283496e-164.486162e-161.415589e-151.000000e+00-2.864454e-16-8.191302e-161.131442e-15-3.009169e-162.138702e-16-5.239826e-17-2.462983e-166.492362e-162.160339e-16-1.258007e-17-7.178656e-17-2.488490e-17-1.739150e-172.414117e-150.033751-0.302544-0.162918
    V15-1.522953e-165.392819e-17-5.333200e-172.459280e-161.016075e-16-6.825844e-17-1.845909e-161.109628e-16-8.883532e-163.932439e-161.903820e-16-3.033543e-16-1.185819e-16-2.864454e-161.000000e+009.678376e-16-5.606434e-166.692616e-16-1.423455e-152.118638e-166.349939e-17-3.516820e-161.024768e-16-4.337014e-162.281677e-161.108681e-16-1.246909e-15-9.799748e-16-0.002986-0.0042230.112251
    V163.007488e-16-3.749559e-185.460118e-16-8.218577e-176.264287e-16-1.823748e-164.901078e-162.500367e-16-5.409347e-161.882434e-161.158881e-164.714076e-164.849394e-16-8.191302e-169.678376e-161.000000e+001.641102e-15-2.666175e-151.138371e-154.407936e-16-4.180114e-162.653008e-167.410993e-16-3.508969e-16-3.341605e-16-4.690618e-168.147869e-167.042089e-16-0.003910-0.1965390.005517
    V17-3.106689e-17-5.591430e-162.134100e-16-4.443050e-164.535815e-161.161080e-167.173458e-16-3.808536e-167.071023e-166.617837e-166.624541e-16-3.797286e-168.705885e-171.131442e-15-5.606434e-161.641102e-151.000000e+00-5.251666e-153.694474e-16-8.921672e-16-1.086035e-15-3.486998e-164.072307e-16-1.897694e-167.587211e-172.084478e-166.669179e-16-5.419071e-170.007309-0.326481-0.064803
    V181.645281e-162.877862e-162.870116e-165.369916e-184.196874e-166.313161e-171.638629e-16-3.119192e-161.471108e-164.829483e-169.910529e-17-6.830564e-162.432753e-16-3.009169e-166.692616e-16-2.666175e-15-5.251666e-151.000000e+00-2.719935e-15-4.098224e-16-1.240266e-15-5.279657e-16-2.362311e-16-1.869482e-16-2.451121e-163.089442e-162.209663e-168.158517e-160.035650-0.111485-0.003518
    V191.045442e-16-1.672445e-173.728807e-16-2.842719e-16-1.277261e-166.136340e-17-1.132423e-16-3.559019e-161.293082e-164.623218e-17-1.093636e-151.782434e-16-6.331767e-172.138702e-16-1.423455e-151.138371e-153.694474e-16-2.719935e-151.000000e+002.693620e-166.052450e-16-1.036140e-155.861740e-16-9.630049e-178.161694e-165.479257e-16-1.243578e-16-1.291833e-15-0.0561510.0347830.021566
    V209.852066e-173.898167e-171.267443e-16-2.222520e-16-2.414281e-16-1.318056e-161.889527e-161.098800e-17-3.112119e-16-1.340974e-15-1.478641e-162.673446e-16-3.200986e-17-5.239826e-172.118638e-164.407936e-16-8.921672e-16-4.098224e-162.693620e-161.000000e+00-1.118296e-151.101689e-151.107203e-161.749671e-16-6.786605e-18-3.590893e-16-8.488785e-16-4.584320e-160.3394030.0200900.000978
    V21-1.808291e-164.667231e-171.189711e-161.390687e-179.325965e-17-4.925144e-17-7.597231e-17-2.338214e-162.755460e-161.048675e-156.632474e-185.724276e-161.428638e-16-2.462983e-166.349939e-17-4.180114e-16-1.086035e-15-1.240266e-156.052450e-16-1.118296e-151.000000e+003.540128e-154.521934e-161.014531e-16-1.173906e-16-4.337929e-16-1.484206e-151.584856e-160.1059990.040413-0.011915
    V227.990046e-171.203109e-16-2.343257e-162.189964e-16-6.982655e-17-9.729827e-17-6.887963e-16-6.701600e-18-2.171404e-16-2.890990e-161.312323e-17-3.587155e-17-4.602453e-176.492362e-16-3.516820e-162.653008e-16-3.486998e-16-5.279657e-16-1.036140e-151.101689e-153.540128e-151.000000e+003.086083e-166.736130e-17-9.827185e-16-2.194486e-171.478149e-16-5.686304e-16-0.0648010.000805-0.016610
    V231.056762e-163.242403e-16-8.182206e-171.663593e-16-1.848644e-16-3.176032e-171.393022e-162.701514e-16-1.011218e-161.907376e-161.404725e-163.029886e-16-7.174408e-162.160339e-161.024768e-167.410993e-164.072307e-16-2.362311e-165.861740e-161.107203e-164.521934e-163.086083e-161.000000e+007.328447e-17-7.508801e-161.284451e-154.254579e-161.281294e-15-0.112633-0.0026850.006004
    V24-5.113541e-17-1.254911e-16-3.300147e-171.403733e-16-9.892370e-16-1.125379e-152.078026e-17-2.444390e-16-2.940457e-16-7.312196e-171.672342e-154.510178e-16-6.376621e-16-1.258007e-17-4.337014e-16-3.508969e-16-1.897694e-16-1.869482e-16-9.630049e-171.749671e-161.014531e-166.736130e-177.328447e-171.000000e+001.242718e-151.863258e-16-2.894257e-16-2.844233e-160.005146-0.0072210.004328
    V25-2.165870e-168.784216e-171.123060e-166.312530e-16-1.561416e-165.563670e-16-1.507689e-17-1.792313e-162.137255e-16-3.457860e-16-6.082687e-166.970336e-18-1.142909e-16-7.178656e-172.281677e-16-3.341605e-167.587211e-17-2.451121e-168.161694e-16-6.786605e-18-1.173906e-16-9.827185e-16-7.508801e-161.242718e-151.000000e+002.449277e-15-5.340203e-162.699748e-16-0.0478370.003308-0.003497
    V26-1.408070e-162.450901e-16-2.136494e-16-4.009636e-163.403172e-16-2.627057e-16-7.709408e-161.092765e-17-1.039639e-16-4.117783e-16-1.240097e-161.653468e-16-1.478991e-16-2.488490e-171.108681e-16-4.690618e-162.084478e-163.089442e-165.479257e-16-3.590893e-16-4.337929e-16-2.194486e-171.284451e-151.863258e-162.449277e-151.000000e+00-2.939564e-16-2.558739e-16-0.0032080.0044550.001146
    V271.664818e-16-5.467509e-164.752587e-16-6.309346e-173.299056e-16-4.040640e-16-2.647380e-183.921512e-16-1.499396e-16-3.115507e-16-1.519253e-16-2.721798e-16-5.300185e-16-1.739150e-17-1.246909e-158.147869e-166.669179e-162.209663e-16-1.243578e-16-8.488785e-16-1.484206e-151.478149e-164.254579e-16-2.894257e-16-5.340203e-16-2.939564e-161.000000e+00-2.403217e-160.0288250.017580-0.008676
    V282.419798e-16-6.910935e-176.073110e-16-2.064064e-16-3.491468e-164.612882e-172.115388e-17-5.158971e-167.982292e-163.949646e-16-2.909057e-167.065902e-161.043260e-152.414117e-15-9.799748e-167.042089e-16-5.419071e-178.158517e-16-1.291833e-15-4.584320e-161.584856e-16-5.686304e-161.281294e-15-2.844233e-162.699748e-16-2.558739e-16-2.403217e-161.000000e+000.0102580.009536-0.007492
    Amount-2.277087e-01-5.314089e-01-2.108805e-019.873167e-02-3.863563e-012.159812e-013.973113e-01-1.030791e-01-4.424560e-02-1.015021e-011.039770e-04-9.541802e-035.293409e-033.375117e-02-2.985848e-03-3.909527e-037.309042e-033.565034e-02-5.615079e-023.394034e-011.059989e-01-6.480065e-02-1.126326e-015.146217e-03-4.783686e-02-3.208037e-032.882546e-021.025822e-021.0000000.005632-0.006667
    Class-1.013473e-019.128865e-02-1.929608e-011.334475e-01-9.497430e-02-4.364316e-02-1.872566e-011.987512e-02-9.773269e-02-2.168829e-011.548756e-01-2.605929e-01-4.569779e-03-3.025437e-01-4.223402e-03-1.965389e-01-3.264811e-01-1.114853e-013.478301e-022.009032e-024.041338e-028.053175e-04-2.685156e-03-7.220907e-033.307706e-034.455398e-031.757973e-029.536041e-030.0056321.000000-0.017109
    Hour-5.214205e-037.802199e-03-2.156874e-02-3.506295e-02-3.513442e-02-1.894502e-02-9.729167e-033.210647e-02-1.898298e-012.417660e-02-1.351310e-013.524592e-01-1.879810e-01-1.629179e-011.122505e-015.517040e-03-6.480333e-02-3.518403e-032.156599e-029.780928e-04-1.191466e-02-1.660982e-026.004232e-034.328237e-03-3.497363e-031.146125e-03-8.676362e-03-7.492140e-03-0.006667-0.0171091.000000

    从数值上看,数值变量之间没有明显的相关性,因为样本的不均衡,数值变量与预测类别之间的相关性也不大,这并不是我们想看到的,因而我们考虑先进行正负样本的均衡

    但是在进行样本均衡化处理之前,我们需要先对样本进行训练集和测试集的划分,这是为了保证测试的公平性,我们需要在原始数据集上测试,毕竟不能用自己构造的数据训练集测试自己构造的测试集
    为了使得训练集和测试集的正负样本分布一致,采用StratifiedKFold来划分

    from sklearn.model_selection import StratifiedKFold
    X_original = data.drop(columns='Class')
    y_original = data['Class']
    sss = StratifiedKFold(n_splits=5,random_state=None,shuffle=False)
    for train_index,test_index in sss.split(X_original,y_original):
        print('Train:',train_index,'test:',test_index)
        X_train,X_test = X_original.iloc[train_index],X_original.iloc[test_index]
        y_train,y_test = y_original.iloc[train_index],y_original.iloc[test_index]
    
    print(X_train.shape)
    print(X_test.shape)
    
    Train: [ 52587  52905  53232 ... 284804 284805 284806] test: [    0     1     2 ... 56966 56967 56968]
    Train: [     0      1      2 ... 284804 284805 284806] test: [ 52587  52905  53232 ... 113937 113938 113939]
    Train: [     0      1      2 ... 284804 284805 284806] test: [104762 104905 105953 ... 170897 170898 170899]
    Train: [     0      1      2 ... 284804 284805 284806] test: [162214 162834 162995 ... 227854 227855 227856]
    Train: [     0      1      2 ... 227854 227855 227856] test: [222268 222591 222837 ... 284804 284805 284806]
    (227846, 30)
    (56961, 30)
    
    #查看训练集和测试集的类别分布是否一致
    print('Train:',[y_train.value_counts()/y_train.value_counts().sum()])
    print('Test:',[y_test.value_counts()/y_test.value_counts().sum()])
    
    Train: [0    0.998271
    1    0.001729
    Name: Class, dtype: float64]
    Test: [0    0.99828
    1    0.00172
    Name: Class, dtype: float64]
    

    对于处理正负样本极不均衡的情况,主要有欠采样和过采样两种方案.

    • 欠采样是从大样本中随机选择与小样本同样数量的样本,对于极不均衡的问题会出现欠拟合问题,因为样本量太小。采用的方法是:
     from imblearn.under_sampling import RandomUnderSampler
     rus = RandomUnderSampler(random_state=1)
     X_undersampled,y_undersampled = rus.fit_resampled(X,y)
    
    • 过采样是利用利用小样本生成与大样本相同数量的样本,有两种方法:随机过采样和SMOTE法过采样
      • 随机过采样是从小样本中随机抽取一定的数量的旧样本,组成一个与大样本相同数量的新样本,这种处理方法容易出现过拟合
      from imblearn.over_sampling import RandomOverSampler
      ros = RandomOverSampler(random_state = 1)
      X_oversampled,y_oversampled = ros.fit_resample(X,y)
      
      • SMOTE,即合成少数类过采样技术,针对随机过采样容易出现过拟合问题的改进方案。根据样本不同,分成数据多和数据少的两类,从数据少的类中随机选一个样本,再找到小类样本中离选定样本点近的几个样本,取这几个样本与选定样本连线上的点,作为新生成的样本,重复步骤直到达到大样本的样本数。
      from imblearn.over_sampling import SMOTE
      smote = SMOTE(random_state = 1)
      X_smotesampled,y_smotesampled = smote.fit_resample(X,y)
      

    可以利用Counter(y_smotesampled)查看两种样本的数量是否一致

    下面分别用随机下采样和SMOTE与随机下采样的结合,这两种方案来对样本进行处理,并分析对比处理后样本的分布情况

    from imblearn.under_sampling import RandomUnderSampler
    from imblearn.over_sampling import RandomOverSampler
    from imblearn.over_sampling import SMOTE
    from collections import Counter
    
    X = X_train.copy()
    y = y_train.copy()
    print('Imblanced samples: ', Counter(y))
    
    rus = RandomUnderSampler(random_state=1)
    X_rus, y_rus = rus.fit_resample(X, y)
    print('Random under sample: ', Counter(y_rus))
    ros = RandomOverSampler(random_state=1)
    X_ros, y_ros = ros.fit_resample(X, y)
    print('Random over sample: ', Counter(y_ros))
    smote = SMOTE(random_state=1,sampling_strategy=0.5)
    X_smote, y_smote = smote.fit_resample(X, y)
    print('SMOTE: ', Counter(y_smote))
    under = RandomUnderSampler(sampling_strategy=1)
    X_smote, y_smote = under.fit_resample(X_smote,y_smote)
    print('SMOTE: ', Counter(y_smote))
    
    Imblanced samples:  Counter({0: 227452, 1: 394})
    Random under sample:  Counter({0: 394, 1: 394})
    Random over sample:  Counter({0: 227452, 1: 227452})
    SMOTE:  Counter({0: 227452, 1: 113726})
    SMOTE:  Counter({0: 113726, 1: 113726})
    

    根据结果,可以看出处理后的样本数,欠采样是正负样本都变成了394,而过采样是都变成了227452。下面分别分析不同平衡样本的情况

    随机下采样 Random under sample

    data_rus = pd.concat([X_rus,y_rus],axis=1)
    
    # 单个数值特征分布情况
    f = pd.melt(data_rus, value_vars=X_train.columns)
    g = sns.FacetGrid(f, col='variable', col_wrap=3, sharex=False, sharey=False)
    g = g.map(sns.distplot, 'value')
    

    在这里插入图片描述

    # 单个数值特征分布箱线图
    f = pd.melt(data_rus, value_vars=X_train.columns)
    g = sns.FacetGrid(f,col='variable', col_wrap=3, sharex=False, sharey=False,size=5)
    g = g.map(sns.boxplot, 'value', color='lightskyblue')
    

    在这里插入图片描述

    # 单个数值特征分布小提琴图
    f = pd.melt(data_rus, value_vars=X_train.columns)
    g = sns.FacetGrid(f, col='variable', col_wrap=3, sharex=False, sharey=False,size=5)
    g = g.map(sns.violinplot, 'value',color='lightskyblue')
    

    在这里插入图片描述

    通过分布图可以看到大部分的数值分布较集中,但也存在一些异常值(箱型图和小提琴图看起来更直观),异常值会带偏模型,所以我们需要去除异常值,判断异常值有两种方法:正态分布和上下四分位数。正态分布是采用 3 σ 3\sigma 3σ原则,上下四分位数是利用分位数,超过一分位数或三分位数一定的范围将被确定为异常值

    def outlier_process(data,column):
        Q1 = data[column].quantile(q=0.25)
        Q3 = data[column].quantile(q=0.75)
        low_whisker = Q1-3*(Q3-Q1)
        high_whisker = Q3+3*(Q3-Q1)
        # 删除异常值
        data_drop = data[(data[column]>=low_whisker) & (data[column]<=high_whisker)]
        #画出删除前和删除后的对比图
        fig,(ax1,ax2) = plt.subplots(1,2,figsize=(12,5))
        sns.boxplot(y=data[column],ax=ax1,color='lightskyblue')
        ax1.set_title('before deleting outlier'+' '+column)
        sns.boxplot(y=data_drop[column],ax=ax2,color='lightskyblue')
        ax2.set_title('after deleting outlier'+' '+column)
        return data_drop
    
    numerical_columns = data_rus.columns.drop('Class')
    for col_name in numerical_columns:
        data_rus = outlier_process(data_rus,col_name)
    

    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述

    因为有些属性的分布较集中,没有出现太大变化,比如Hour。

    清理完脏数据,再探测变量之间的相关关系

    #绘制采样前和采样后的热力图
    fig,(ax1,ax2) = plt.subplots(2,1,figsize=(10,10))
    sns.heatmap(data.corr(),cmap = 'coolwarm_r',ax=ax1,vmax=0.8)
    ax1.set_title('the relationship on imbalanced samples')
    sns.heatmap(data_rus.corr(),cmap = 'coolwarm_r',ax=ax2,vmax=0.8)
    ax2.set_title('the relationship on random under samples')
    
    Text(0.5, 1.0, 'the relationship on random under samples')
    

    在这里插入图片描述

    # 分析数值属性与Class之间的相关性
    data_rus.corr()['Class'].sort_values(ascending=False)
    
    Class     1.000000
    V4        0.722609
    V11       0.694975
    V2        0.481190
    V19       0.245209
    V20       0.151424
    V21       0.126220
    Amount    0.100853
    V26       0.090654
    V27       0.074491
    V8        0.059647
    V28       0.052788
    V25       0.040578
    V22       0.016379
    V23      -0.003117
    V15      -0.009488
    V13      -0.055253
    V24      -0.070806
    Hour     -0.196789
    V5       -0.383632
    V6       -0.407577
    V1       -0.423665
    V18      -0.462499
    V7       -0.468273
    V17      -0.561113
    V3       -0.561767
    V9       -0.562542
    V16      -0.592382
    V10      -0.629362
    V12      -0.691652
    V14      -0.751142
    Name: Class, dtype: float64
    

    对比可以看到,处理后的数值属性与分类类别之间的相关性明显增加,根据排名,和预测值Class正相关值较大的有V4,V11,V2,V27,负相关值较大的有V_14,V_10,V_12,V_3,V7,V9,我们画出这些特征与预测值之间的关系图

    # 正相关的属性与Class分布图
    fig,(ax1,ax2,ax3) = plt.subplots(1,3,figsize=(24,6))
    sns.violinplot(x='Class',y='V4',data=data_rus,palette=['lightsalmon','lightskyblue'],ax=ax1)
    ax1.set_title('V4 vs Class Positive Correlation')
    
    sns.violinplot(x='Class',y='V11',data=data_rus,palette=['lightsalmon','lightskyblue'],ax=ax2)
    ax2.set_title('V11 vs Class Positive Correlation')
    
    sns.violinplot(x='Class',y='V2',data=data_rus,palette=['lightsalmon','lightskyblue'],ax=ax3)
    ax3.set_title('V2 vs Class Positive Correlation')
    
    Text(0.5, 1.0, 'V2 vs Class Positive Correlation')
    

    在这里插入图片描述

    # 负相关的属性与Class分布图
    fig,((ax1,ax2,ax3),(ax4,ax5,ax6)) = plt.subplots(2,3,figsize=(24,12))
    sns.violinplot(x='Class',y='V14',data=data_rus,palette=['lightsalmon','lightskyblue'],ax=ax1)
    ax1.set_title('V14 vs Class Negative Correlation')
    
    sns.violinplot(x='Class',y='V10',data=data_rus,palette=['lightsalmon','lightskyblue'],ax=ax2)
    ax2.set_title('V10 vs Class Negative Correlation')
    
    sns.violinplot(x='Class',y='V12',data=data_rus,palette=['lightsalmon','lightskyblue'],ax=ax3)
    ax3.set_title('V12 vs Class Negative Correlation')
    
    sns.violinplot(x='Class',y='V3',data=data_rus,palette=['lightsalmon','lightskyblue'],ax=ax4)
    ax4.set_title('V3 vs Class Negative Correlation')
    
    sns.violinplot(x='Class',y='V7',data=data_rus,palette=['lightsalmon','lightskyblue'],ax=ax5)
    ax5.set_title('V7 vs Class Negative Correlation')
    
    sns.violinplot(x='Class',y='V9',data=data_rus,palette=['lightsalmon','lightskyblue'],ax=ax6)
    ax6.set_title('V9 vs Class Negative Correlation')
    
    
    Text(0.5, 1.0, 'V9 vs Class Negative Correlation')
    

    在这里插入图片描述

    #其他属性与Class的分布图
    other_fea = list(data_rus.columns.drop(['V11','V4','V2','V17','V14','V12','V10','V7','V3','V9','Class']))
    fig,ax = plt.subplots(5,4,figsize=(24,36))
    for fea in other_fea:
        sns.violinplot(x='Class',y= fea,data=data_rus,palette=['lightsalmon','lightskyblue'],ax=ax[divmod(other_fea.index(fea),4)[0],divmod(other_fea.index(fea),4)[1]])
        ax[divmod(other_fea.index(fea),4)[0],divmod(other_fea.index(fea),4)[1]].set_title(fea)
    

    在这里插入图片描述

    从以上小提琴图可以看出,不同的正负样本在属性值上取值分布确实存在不同,而其他的属性在正负样本上区别相对不大。

    好奇金额和时间是否与Class的关系,金额上正常交易金额更少更集中一些,而欺诈交易金额相对较大且分布更散,而时间上正常交易时间跨度小于欺诈交易的时间跨度,所以是在睡觉的时候更可能产生欺诈交易

    查看完特征和类别之间的关系,下面分析特征之间的关系,从热力图上可以看出,属性之间是存在相关性的,下面具体看看是否存在多重共线性

    sns.set()
    sns.pairplot(data[list(data_rus.columns)],kind='scatter',diag_kind = 'kde')
    plt.show()
    

    在这里插入图片描述

    从上图可以看出,这些属性之间不是完全相互独立的,有些存在很强的线性相关性,我们利用方差膨胀系数(VIF)作进一步检验

    from statsmodels.stats.outliers_influence import variance_inflation_factor
    vif = [variance_inflation_factor(data_rus.values, data_rus.columns.get_loc(
        i)) for i in data_rus.columns]
    vif
    
    [12.662373162589994,
     20.3132501576979,
     26.78027354608725,
     9.970255022795625,
     23.531563683157597,
     3.4386660732204946,
     67.84989394913013,
     5.76519495696649,
     7.129002458395831,
     23.226754020950764,
     11.753104213590975,
     29.49673779700361,
     1.3365891898690718,
     21.57973674600878,
     1.2669840488461022,
     27.61485162786757,
     31.081940593780782,
     14.088642210869459,
     2.502857511412321,
     4.96077803555917,
     5.169599871511768,
     3.1235143157354583,
     2.828445638986856,
     1.1937601054384332,
     1.628451339236206,
     1.1966413137632343,
     1.959903999050125,
     1.4573293665681395,
     6.314999796714301,
     2.0990707198901117,
     4.802392100187543]
    

    一般认为 V I F < 10 VIF<10 VIF<10时,该变量与其余变量之间不存在多重共线性,当 10 < V I F < 100 10<VIF<100 10<VIF<100时存在较强的多重共线性,当 V I F > 100 VIF>100 VIF>100时,则认为是存在严重的多重共线性。从以上数值来看,变量之间确实存在多重共线性,也就是存在信息冗余,下面需要进行特征提取或者特征选择

    特征提取和特征选择都是降维的两种方法,特征提取所提取的特征是原特征的映射,而特征选择选出的是原特征的子集。主成分分析和线性判别分析是特征提取的两种经典方法。

    特征选择:当数据处理完成后,需要选择有意义的特征输入机器学习的算法和模型进行训练,主要从两个方面来选择考虑特征

    • 特征是否发散,如果一个特征不发散,例如方差接近0,也就是说样本在这个特征上基本没有差异,这个特征对于样本的区分并没有什么用

    • 特征与目标的相关性,与目标相关性高的特征,应当优先选择。

      根据特征选择的形式又可以将特征选择分为3种:
      1、过滤法:按照发散性或者相关性对各个特征进行评分,设定阈值或带选择阈值的个数,选择特征(方差选择法、相关系数法、卡方检验、互信息法)
      2、包装法:根据目标函数(通常是预测效果评分),每次选择若干特征,或者排除若干特征(递归特征消除法)
      3、嵌入法:先使用某些机器学习的算法和模型进行训练,得到各个特征的权值系数,根据系数从大到小选择特征,类似于filter方法,但是通过训练来确定特征的优劣(基于惩罚项的特征选择法、基于树模型的特征选择法)

    岭回归和Lasso是两种对线性模型特征选择的方法,都是加入正则化项防止过拟合,岭回归加入的是二阶范数的正则化项,而Lasso加入的是一级范式,其中Lasso能够将一些作用比较小的特征的参数训练为0,从而获得稀疏解,也就是在训练模型时实现了降维的目的。
    对于树模型,有随机森林分类器对特征的重要性进行排序,可以达到筛选的目的。
    本文先采用两种特征选择的方法分别选择出重要的特征,看看特征有什么差别

    # 利用Lasso进行特征选择
    from sklearn.linear_model import LassoCV
    from sklearn.model_selection import cross_val_score
    #调用LassoCV函数,并进行交叉验证
    model_lasso = LassoCV(alphas=[0.1,0.01,0.005,1],random_state=1,cv=5).fit(X_rus,y_rus)
    #输出看模型中最终选择的特征
    coef = pd.Series(model_lasso.coef_,index=X_rus.columns)
    print(coef[coef != 0].abs().sort_values(ascending = False))
    
    V4        0.062065
    V14       0.045851
    Amount    0.040011
    V26       0.038201
    V13       0.031702
    V7        0.028889
    V22       0.028509
    V18       0.028171
    V6        0.019226
    V1        0.018757
    V21       0.016032
    V10       0.014742
    V28       0.012483
    V8        0.011273
    V20       0.010726
    V9        0.010358
    V24       0.010227
    V17       0.007217
    V2        0.006838
    Hour      0.004757
    V15       0.003393
    V27       0.002588
    V19       0.000275
    dtype: float64
    
    # 利用随机森林进行特征重要性排序
    from sklearn.ensemble import RandomForestClassifier
    rfc_fea_model = RandomForestClassifier(random_state=1)
    rfc_fea_model.fit(X_rus,y_rus)
    fea = X_rus.columns
    importance = rfc_fea_model.feature_importances_
    a = pd.DataFrame()
    a['feature'] = fea
    a['importance'] = importance
    a = a.sort_values('importance',ascending = False)
    plt.figure(figsize=(20,10))
    plt.bar(a['feature'],a['importance'])
    plt.title('the importance orders sorted by random forest')
    plt.show()
    

    在这里插入图片描述

    a.cumsum()
    
    featureimportance
    11V120.134515
    9V12V100.268117
    16V12V10V170.388052
    13V12V10V17V140.503750
    3V12V10V17V14V40.615973
    10V12V10V17V14V4V110.687904
    1V12V10V17V14V4V11V20.731179
    15V12V10V17V14V4V11V2V160.774247
    2V12V10V17V14V4V11V2V16V30.813352
    6V12V10V17V14V4V11V2V16V3V70.846050
    18V12V10V17V14V4V11V2V16V3V7V190.860559
    17V12V10V17V14V4V11V2V16V3V7V19V180.873657
    20V12V10V17V14V4V11V2V16V3V7V19V18V210.886416
    28V12V10V17V14V4V11V2V16V3V7V19V18V21Amount0.896986
    26V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV270.906369
    19V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V200.915172
    14V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V150.923700
    22V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V230.931990
    12V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V130.939298
    7V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V80.946163
    5V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V8V60.952952
    25V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V8V6V260.959418
    8V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V8V6V26V90.965869
    0V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V8V6V26V9V10.971975
    4V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V8V6V26V9V1V50.977297
    21V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V8V6V26V9V1V5V220.982325
    27V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V8V6V26V9V1V5V22V280.986990
    23V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V8V6V26V9V1V5V22V28V240.991551
    29V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V8V6V26V9V1V5V22V28V24Hour0.995913
    24V12V10V17V14V4V11V2V16V3V7V19V18V21AmountV27V20V15V23V13V8V6V26V9V1V5V22V28V24HourV251.000000

    从以上结果来看,两种方法的排序,特征重要性差别很大,为了更大程度的保留数据的信息,我们采用两种结合的特征,包括[‘V14’,‘V12’,‘V10’,‘V17’,‘V4’,‘V11’,‘V3’,‘V2’,‘V7’,‘V16’,‘V18’,‘Amount’,‘V19’,‘V20’,‘V23’,‘V21’,‘V15’,‘V9’,‘V6’,‘V27’,‘V25’,‘V5’,‘V13’,‘V22’,‘Hour’,‘V28’,‘V1’,‘V8’,‘V26’],其中选择的标准是随机森林中重要性总和95%以上,如果其中有Lasso回归没有的,则加入,共选出29个特征(只有V24没有被选择)

    # 使用选择的特征采进行训练和测试
    from sklearn.metrics import accuracy_score
    from sklearn.metrics import confusion_matrix
    from sklearn.metrics import plot_confusion_matrix
    from sklearn.metrics import classification_report
    from sklearn.linear_model import LogisticRegression  # 逻辑回归
    from sklearn.neighbors import KNeighborsClassifier  # KNN
    from sklearn.naive_bayes import GaussianNB  # 朴素贝叶斯
    from sklearn.svm import SVC  # 支持向量分类
    from sklearn.tree import DecisionTreeClassifier  # 决策树
    from sklearn.ensemble import RandomForestClassifier  # 随机森林
    from sklearn.ensemble import AdaBoostClassifier  # Adaboost
    from sklearn.ensemble import GradientBoostingClassifier  # GBDT
    from xgboost import XGBClassifier  # XGBoost
    from lightgbm import LGBMClassifier  # lightGBM
    from sklearn.metrics import roc_curve  # 绘制ROC曲线
    Classifiers = {
        'LG': LogisticRegression(random_state=1),
                       'KNN': KNeighborsClassifier(),
                       'Bayes': GaussianNB(),
                       'SVC': SVC(random_state=1,probability=True),
                       'DecisionTree': DecisionTreeClassifier(random_state=1),
                       'RandomForest': RandomForestClassifier(random_state=1),
                    'Adaboost':AdaBoostClassifier(random_state=1),
                       'GBDT': GradientBoostingClassifier(random_state=1),
                       'XGboost': XGBClassifier(random_state=1),
                       'LightGBM': LGBMClassifier(random_state=1)
                  }
    def train_test(Classifiers, X_train, y_train, X_test, y_test):
        y_pred = pd.DataFrame()
        Accuracy_Score = pd.DataFrame()
    #     score.model_name = Classifiers.keys
        for model_name, model in Classifiers.items():
            model.fit(X_train, y_train)
            y_pred[model_name] = model.predict(X_test)
            y_pred_pra = model.predict_proba(X_test)
            Accuracy_Score[model_name] = pd.Series(model.score(X_test, y_test))
            # 计算召回率
            print(model_name, '\n', classification_report(
                y_test, y_pred[model_name]))
    #         confu_mat = confusion_matrix(y_test,y_pred[model_name])
    #         plt.matshow(confu_mat,cmap = plt.cm.Blues)
    #         plt.title(model_name)
    #         plt.colorbar()
            # 画出混淆矩阵
            fig, ax = plt.subplots(1, 1)
            plot_confusion_matrix(model, X_test, y_test, labels=[
                                  0, 1], cmap='Blues', ax=ax)
            ax.set_title(model_name)
            # 画出roc曲线
            plt.figure()
            fig,(ax1,ax2) = plt.subplots(1,2,figsize=(10,4))
            fpr, tpr, thres = roc_curve(y_test, y_pred_pra[:, -1])
            ax1.plot(fpr, tpr)
            ax1.set_title(model_name+' ROC')
            ax1.set_xlabel('fpr')
            ax1.set_ylabel('tpr')
            # 画出KS曲线
            ax2.plot(thres[1:],tpr[1:])
            ax2.plot(thres[1:],fpr[1:])
            ax2.plot(thres[1:],tpr[1:]-fpr[1:])
            ax2.set_xlabel('threshold')
            ax2.legend(['tpr','fpr','tpr-fpr'])
            plt.sca(ax2)
            plt.gca().invert_xaxis()
    #         ax2.gca().invert_xaxis()
            ax2.set_title(model_name+' KS')
    
        return y_pred,Accuracy_Score
    
    
    # test_cols = ['V12', 'V14', 'V10', 'V17', 'V11', 'V4', 'V2', 'V16', 'V7', 'V3',
    #                     'V18', 'Amount', 'V19', 'V21', 'V20', 'V8', 'V15', 'V6', 'V27', 'V26', 'V1','V9','V13','V22','Hour','V23','V28']
    test_cols = X_rus.columns.drop('V24')
    Y_pred,Accuracy_score = train_test(
        Classifiers, X_rus[test_cols], y_rus, X_test[test_cols], y_test)
    Accuracy_score
    
    LG
                   precision    recall  f1-score   support
    
               0       1.00      0.96      0.98     56863
               1       0.04      0.94      0.07        98
    
        accuracy                           0.96     56961
       macro avg       0.52      0.95      0.53     56961
    weighted avg       1.00      0.96      0.98     56961
    
    KNN
                   precision    recall  f1-score   support
    
               0       1.00      0.97      0.99     56863
               1       0.06      0.93      0.11        98
    
        accuracy                           0.97     56961
       macro avg       0.53      0.95      0.55     56961
    weighted avg       1.00      0.97      0.99     56961
    
    Bayes
                   precision    recall  f1-score   support
    
               0       1.00      0.96      0.98     56863
               1       0.04      0.88      0.08        98
    
        accuracy                           0.96     56961
       macro avg       0.52      0.92      0.53     56961
    weighted avg       1.00      0.96      0.98     56961
    
    SVC
                   precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.08      0.91      0.14        98
    
        accuracy                           0.98     56961
       macro avg       0.54      0.94      0.57     56961
    weighted avg       1.00      0.98      0.99     56961
    
    DecisionTree
                   precision    recall  f1-score   support
    
               0       1.00      0.88      0.93     56863
               1       0.01      0.95      0.03        98
    
        accuracy                           0.88     56961
       macro avg       0.51      0.91      0.48     56961
    weighted avg       1.00      0.88      0.93     56961
    
    RandomForest
                   precision    recall  f1-score   support
    
               0       1.00      0.97      0.98     56863
               1       0.05      0.93      0.09        98
    
        accuracy                           0.97     56961
       macro avg       0.52      0.95      0.54     56961
    weighted avg       1.00      0.97      0.98     56961
    
    Adaboost
                   precision    recall  f1-score   support
    
               0       1.00      0.95      0.98     56863
               1       0.03      0.95      0.07        98
    
        accuracy                           0.95     56961
       macro avg       0.52      0.95      0.52     56961
    weighted avg       1.00      0.95      0.97     56961
    
    GBDT
                   precision    recall  f1-score   support
    
               0       1.00      0.96      0.98     56863
               1       0.04      0.92      0.07        98
    
        accuracy                           0.96     56961
       macro avg       0.52      0.94      0.53     56961
    weighted avg       1.00      0.96      0.98     56961
    
    [15:05:57] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
    XGboost
                   precision    recall  f1-score   support
    
               0       1.00      0.97      0.98     56863
               1       0.04      0.93      0.09        98
    
        accuracy                           0.97     56961
       macro avg       0.52      0.95      0.53     56961
    weighted avg       1.00      0.97      0.98     56961
    
    LightGBM
                   precision    recall  f1-score   support
    
               0       1.00      0.97      0.98     56863
               1       0.05      0.94      0.10        98
    
        accuracy                           0.97     56961
       macro avg       0.53      0.95      0.54     56961
    weighted avg       1.00      0.97      0.98     56961
    
    LGKNNBayesSVCDecisionTreeRandomForestAdaboostGBDTXGboostLightGBM
    00.9594460.9735610.9628870.9814790.8778990.9674510.9534070.9604820.9656780.969611

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    # 集成学习
    # 根据以上auc以及recall的结果,选择LG,DT以及GBDT当作基模型
    from sklearn.ensemble import VotingClassifier
    voting_clf = VotingClassifier(estimators=[
    #     ('LG', LogisticRegression(random_state=1)),
                       ('KNN', KNeighborsClassifier()),
    #     ('Bayes',GaussianNB()),
                       ('SVC', SVC(random_state=1,probability=True)),
    #                    ('DecisionTree', DecisionTreeClassifier(random_state=1)),
    #                    ('RandomForest', RandomForestClassifier(random_state=1)),
    #                 ('Adaboost',AdaBoostClassifier(random_state=1)),
    #                    ('GBDT', GradientBoostingClassifier(random_state=1)),
    #                    ('XGboost', XGBClassifier(random_state=1)),
                       ('LightGBM', LGBMClassifier(random_state=1))
                                             ])
    voting_clf.fit(X_rus[test_cols], y_rus)
    y_final_pred = voting_clf.predict(X_test[test_cols])
    print(classification_report(y_test, y_final_pred))
    fig, ax = plt.subplots(1, 1)
    plot_confusion_matrix(voting_clf, X_test[test_cols], y_test, labels=[
                                  0, 1], cmap='Blues', ax=ax)
    
                  precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.08      0.94      0.14        98
    
        accuracy                           0.98     56961
       macro avg       0.54      0.96      0.57     56961
    weighted avg       1.00      0.98      0.99     56961
    
    
    
    
    
    
    <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x213686f0700>
    

    在这里插入图片描述

    通过以上结果,模型的精确度accuracy都挺高的,尤其是支持向量分类的达到了98.1479%,但对于不平衡样本来说,更应该关注召回率,在SVC中,识别出欺诈交易的概率是91%,而对于非欺诈样本的识别率是99%,分别有1046和9份样本识别错误。

    在进行模型调优之前,尝试着用默认参数来构成模型集成,选择出精确度高的3个模型,KNN,SVC,和Light GBM,得到的结果的精确度是98%,分别有6份和1114份样本识别错误,在欺诈样本的识别上有所提升,但是非欺诈样本检测成欺诈样本数量增加了。

    因为模型中使用的参数都是默认的,考虑利用网格搜索确定最佳参数,当然网格搜索也不一定能找到最好的参数

    初步模型已经建立,但是模型的参数采用的是模型默认的,下面进行调参。调参的方法有三种:随机搜索、网格搜索以及贝叶斯优化。而随即搜索和网格搜索当超参数过多时,极其耗时,因为他们的搜索次数是所有参数的组合,而利用贝叶斯优化进行调参会考虑之前的参数信息,不断更新先验,且迭代次数少,速度快。

    # 网格搜索找最佳参数
    from sklearn.model_selection import GridSearchCV
    def reg_best(X_train, y_train):
        log_reg_parames = {'penalty': ['l1', 'l2'],
                           'C': [0.001, 0.01, 0.05, 0.1, 1, 10]}
        grid_log_reg = GridSearchCV(LogisticRegression(), log_reg_parames)
        grid_log_reg.fit(X_train, y_train)
        log_reg_best = grid_log_reg.best_estimator_
        print(log_reg_best)
        return log_reg_best
    
    
    def KNN_best(X_train, y_train):
        KNN_parames = {'n_neighbors': [3, 5, 7, 9, 11, 15], 'algorithm': [
            'auto', 'ball_tree', 'kd_tree', 'brute']}
        grid_KNN = GridSearchCV(KNeighborsClassifier(), KNN_parames)
        grid_KNN.fit(X_train, y_train)
        KNN_best_ = grid_KNN.best_estimator_
        print(KNN_best_)
        return KNN_best_
    
    
    def SVC_best(X_train, y_train):
        SVC_parames = {'C': [0.5, 0.7, 0.9, 1], 'kernel': [
            'rbf', 'poly', 'sigmoid', 'linear'], 'probability': [True]}
        grid_SVC = GridSearchCV(SVC(), SVC_parames)
        grid_SVC.fit(X_train, y_train)
        SVC_best = grid_SVC.best_estimator_
        print(SVC_best)
        return SVC_best
    
    
    def DecisionTree_best(X_train, y_train):
        DT_parames = {"criterion": ["gini", "entropy"], "max_depth": list(range(2, 4, 1)),
                      "min_samples_leaf": list(range(5, 7, 1))}
        grid_DT = GridSearchCV(DecisionTreeClassifier(), DT_parames)
        grid_DT.fit(X_train, y_train)
        DT_best = grid_DT.best_estimator_
        print(DT_best)
        return DT_best
    
    
    def RandomForest_best(X_train, y_train):
        RF_params = {'n_estimators': [10, 50, 100, 150, 200], 'criterion': [
            'gini', 'entropy'], "min_samples_leaf": list(range(5, 7, 1))}
        grid_RF = GridSearchCV(RandomForestClassifier(), RF_params)
        grid_RF.fit(X_train, y_train)
        RT_best = grid_RF.best_estimator_
        print(RT_best)
        return RT_best
    
    
    def Adaboost_best(X_train, y_train):
        Adaboost_params = {'n_estimators': [10, 50, 100, 150, 200], 'learning_rate': [
            0.01, 0.05, 0.1, 0.5, 1], 'algorithm': ['SAMME', 'SAMME.R']}
        grid_Adaboost = GridSearchCV(AdaBoostClassifier(), Adaboost_params)
        grid_Adaboost.fit(X_train, y_train)
        Adaboost_best_ = grid_Adaboost.best_estimator_
        print(Adaboost_best_)
        return Adaboost_best_
    
    
    def GBDT_best(X_train, y_train):
        GBDT_params = {'n_estimators': [10, 50, 100, 150], 'loss': ['deviance', 'exponential'], 'learning_rate': [
            0.01, 0.05, 0.1], 'criterion': ['friedman_mse', 'mse']}
        grid_GBDT = GridSearchCV(GradientBoostingClassifier(), GBDT_params)
        grid_GBDT.fit(X_train, y_train)
        GBDT_best_ = grid_GBDT.best_estimator_
        print(GBDT_best_)
        return GBDT_best_
    
    
    def XGboost_best(X_train, y_train):
        XGB_params = {'n_estimators': [10, 50, 100, 150, 200], 'max_depth': [5, 10, 15, 20], 'learning_rate': [
            0.01, 0.05, 0.1, 0.5, 1]}
        grid_XGB = GridSearchCV(XGBClassifier(), XGB_params)
        grid_XGB.fit(X_train, y_train)
        XGB_best_ = grid_XGB.best_estimator_
        print(XGB_best_)
        return XGB_best_
    
    
    def LGBM_best(X_train, y_train):
        LGBM_params = {'boosting_type': ['gbdt', 'dart', 'goss', 'rf'], 'num_leaves': [21, 31, 51], 'n_estimators': [10, 50, 100, 150, 200], 'max_depth': [5, 10, 15, 20], 'learning_rate': [
            0.01, 0.05, 0.1, 0.5, 1]}
        grid_LGBM = GridSearchCV(LGBMClassifier(), LGBM_params)
        grid_LGBM.fit(X_train, y_train)
        LGBM_best_ = grid_LGBM.best_estimator_
        print(LGBM_best_)
        return LGBM_best_
    
    Classifiers = {'LG': reg_best(X_rus[test_cols], y_rus),
                  'KNN': KNN_best(X_rus[test_cols], y_rus),
                       'Bayes': GaussianNB(),
                       'SVC': SVC_best(X_rus[test_cols], y_rus),
                       'DecisionTree': DecisionTree_best(X_rus[test_cols], y_rus),
                       'RandomForest': RandomForest_best(X_rus[test_cols], y_rus),
                    'Adaboost':Adaboost_best(X_rus[test_cols], y_rus),
                       'GBDT': GBDT_best(X_rus[test_cols], y_rus),
                       'XGboost': XGboost_best(X_rus[test_cols], y_rus),
                       'LightGBM': LGBM_best(X_rus[test_cols], y_rus)}
    
    
    LogisticRegression(C=0.05)
    KNeighborsClassifier(n_neighbors=3)
    SVC(C=0.7, probability=True)
    DecisionTreeClassifier(criterion='entropy', max_depth=2, min_samples_leaf=5)
    RandomForestClassifier(min_samples_leaf=5)
    AdaBoostClassifier(algorithm='SAMME', learning_rate=0.5, n_estimators=100)
    GradientBoostingClassifier(criterion='mse', loss='exponential')
    XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
                  colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
                  importance_type='gain', interaction_constraints='',
                  learning_rate=0.5, max_delta_step=0, max_depth=5,
                  min_child_weight=1, missing=nan, monotone_constraints='()',
                  n_estimators=10, n_jobs=8, num_parallel_tree=1, random_state=0,
                  reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
                  tree_method='exact', validate_parameters=1, verbosity=None)
    LGBMClassifier(boosting_type='dart', learning_rate=1, max_depth=5,
                   n_estimators=150, num_leaves=21)
    
    # 利用优化后的参数再训练测试
    Y_pred,Accuracy_score = train_test(
        Classifiers, X_rus[test_cols], y_rus, X_test[test_cols], y_test)
    Accuracy_score
    
    LG
                   precision    recall  f1-score   support
    
               0       1.00      0.97      0.99     56863
               1       0.05      0.92      0.10        98
    
        accuracy                           0.97     56961
       macro avg       0.53      0.95      0.54     56961
    weighted avg       1.00      0.97      0.98     56961
    
    KNN
                   precision    recall  f1-score   support
    
               0       1.00      0.96      0.98     56863
               1       0.04      0.93      0.08        98
    
        accuracy                           0.96     56961
       macro avg       0.52      0.95      0.53     56961
    weighted avg       1.00      0.96      0.98     56961
    
    Bayes
                   precision    recall  f1-score   support
    
               0       1.00      0.96      0.98     56863
               1       0.04      0.88      0.08        98
    
        accuracy                           0.96     56961
       macro avg       0.52      0.92      0.53     56961
    weighted avg       1.00      0.96      0.98     56961
    
    SVC
                   precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.09      0.91      0.16        98
    
        accuracy                           0.98     56961
       macro avg       0.54      0.95      0.57     56961
    weighted avg       1.00      0.98      0.99     56961
    
    DecisionTree
                   precision    recall  f1-score   support
    
               0       1.00      0.90      0.95     56863
               1       0.02      0.95      0.03        98
    
        accuracy                           0.90     56961
       macro avg       0.51      0.92      0.49     56961
    weighted avg       1.00      0.90      0.94     56961
    
    RandomForest
                   precision    recall  f1-score   support
    
               0       1.00      0.97      0.99     56863
               1       0.06      0.93      0.11        98
    
        accuracy                           0.97     56961
       macro avg       0.53      0.95      0.55     56961
    weighted avg       1.00      0.97      0.99     56961
    
    Adaboost
                   precision    recall  f1-score   support
    
               0       1.00      0.96      0.98     56863
               1       0.04      0.94      0.08        98
    
        accuracy                           0.96     56961
       macro avg       0.52      0.95      0.53     56961
    weighted avg       1.00      0.96      0.98     56961
    
    GBDT
                   precision    recall  f1-score   support
    
               0       1.00      0.97      0.98     56863
               1       0.05      0.93      0.09        98
    
        accuracy                           0.97     56961
       macro avg       0.52      0.95      0.53     56961
    weighted avg       1.00      0.97      0.98     56961
    
    [15:36:45] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
    XGboost
                   precision    recall  f1-score   support
    
               0       1.00      0.96      0.98     56863
               1       0.04      0.93      0.07        98
    
        accuracy                           0.96     56961
       macro avg       0.52      0.94      0.53     56961
    weighted avg       1.00      0.96      0.98     56961
    
    LightGBM
                   precision    recall  f1-score   support
    
               0       1.00      0.97      0.98     56863
               1       0.05      0.93      0.10        98
    
        accuracy                           0.97     56961
       macro avg       0.53      0.95      0.54     56961
    weighted avg       1.00      0.97      0.98     56961
    
    LGKNNBayesSVCDecisionTreeRandomForestAdaboostGBDTXGboostLightGBM
    00.9725080.9637470.9628870.9831290.8970870.9744740.9621850.9661520.9600430.969769

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    

    在这里插入图片描述

    <Figure size 432x288 with 0 Axes>
    
    # 集成学习
    # 根据以上auc以及recall的结果,选择LG,DT以及GBDT当作基模型
    from sklearn.ensemble import VotingClassifier
    voting_clf = VotingClassifier(estimators=[
        ('LG', LogisticRegression(random_state=1,C=0.05)),
                       ('SVC', SVC(random_state=1,probability=True,C=0.7)),
                       ('RandomForest', RandomForestClassifier(random_state=1,min_samples_leaf=5)),
                ])
    voting_clf.fit(X_rus[test_cols], y_rus)
    y_final_pred=voting_clf.predict(X_test[test_cols])
    print(classification_report(y_test, y_final_pred))
    fig, ax=plt.subplots(1, 1)
    plot_confusion_matrix(voting_clf, X_test[test_cols], y_test, labels=[
                                  0, 1], cmap='Blues', ax=ax)
    
                  precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.07      0.92      0.14        98
    
        accuracy                           0.98     56961
       macro avg       0.54      0.95      0.56     56961
    weighted avg       1.00      0.98      0.99     56961
    
    
    
    
    
    
    <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x2135666d070>
    

    在这里插入图片描述

    从精确度上来看,大部分模型都有提升,也有一些模型的精确度是退化的。在召回率上没有明显提升,甚至有些退化比较明显,这说明对于欺诈样本的识别还是不够明显。因为采用的训练样本数远小于测试样本数,这也导致了测试效果不行。

    SMOTE上采样与随机下采样相结合

    还是对训练样本进行分析再训练模型

    data_smote = pd.concat([X_smote,y_smote],axis=1)
    
    # 单个数值特征分布情况
    f = pd.melt(data_smote, value_vars=X_train.columns)
    g = sns.FacetGrid(f, col='variable', col_wrap=3, sharex=False, sharey=False)
    g = g.map(sns.distplot, 'value')
    

    在这里插入图片描述

    # 单个数值特征分布箱线图
    f = pd.melt(data_smote, value_vars=X_train.columns)
    g = sns.FacetGrid(f,col='variable', col_wrap=3, sharex=False, sharey=False,size=5)
    g = g.map(sns.boxplot, 'value', color='lightskyblue')
    

    在这里插入图片描述

    # 单个数值特征分布小提琴图
    f = pd.melt(data_smote, value_vars=X_train.columns)
    g = sns.FacetGrid(f, col='variable', col_wrap=3, sharex=False, sharey=False,size=5)
    g = g.map(sns.violinplot, 'value',color='lightskyblue')
    

    在这里插入图片描述

    numerical_columns = data_smote.columns.drop('Class')
    for col_name in numerical_columns:
        data_smote = outlier_process(data_smote,col_name)
        print(data_smote.shape)
    
    (214362, 31)
    (213002, 31)
    (212497, 31)
    (212497, 31)
    (204320, 31)
    (202425, 31)
    (198165, 31)
    (189719, 31)
    (189492, 31)
    (189427, 31)
    (189020, 31)
    (189019, 31)
    (189019, 31)
    (189019, 31)
    (189016, 31)
    (188944, 31)
    (184836, 31)
    (184556, 31)
    (184552, 31)
    (180426, 31)
    (178559, 31)
    (178559, 31)
    (174526, 31)
    (174457, 31)
    (174354, 31)
    (174098, 31)
    (169488, 31)
    (166755, 31)
    (156423, 31)
    (156423, 31)
    

    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述

    #绘制采样前和采样后的热力图
    fig,(ax1,ax2) = plt.subplots(2,1,figsize=(10,10))
    sns.heatmap(data.corr(),cmap = 'coolwarm_r',ax=ax1,vmax=0.8)
    ax1.set_title('the relationship on imbalanced samples')
    sns.heatmap(data_smote.corr(),cmap = 'coolwarm_r',ax=ax2,vmax=0.8)
    ax2.set_title('the relationship on random under samples')
    
    Text(0.5, 1.0, 'the relationship on random under samples')
    

    在这里插入图片描述

    # 分析数值属性与Class之间的相关性
    data_smote.corr()['Class'].sort_values(ascending=False)
    
    Class     1.000000
    V11       0.713013
    V4        0.705057
    V2        0.641477
    V21       0.466069
    V27       0.459954
    V28       0.383395
    V20       0.381292
    V8        0.253905
    V25       0.147601
    V26       0.052486
    V19       0.008546
    Amount   -0.001365
    V15      -0.023988
    V22      -0.028250
    Hour     -0.048206
    V5       -0.098754
    V13      -0.142258
    V24      -0.154369
    V23      -0.157138
    V18      -0.170994
    V1       -0.295297
    V17      -0.445826
    V6       -0.465282
    V16      -0.500016
    V7       -0.547495
    V9       -0.567131
    V3       -0.658294
    V12      -0.713942
    V10      -0.748285
    V14      -0.790148
    Name: Class, dtype: float64
    

    根据排名,和预测值Class正相关值较大的有V4,V11,V2,负相关值较大的有V_14,V_10,V_12,V_3,V7,V9,我们画出这些特征与预测值之间的关系图

    # 正相关的属性与Class分布图
    fig,(ax1,ax2,ax3) = plt.subplots(1,3,figsize=(24,6))
    sns.violinplot(x='Class',y='V4',data=data_smote,palette=['lightsalmon','lightskyblue'],ax=ax1)
    ax1.set_title('V11 vs Class Positive Correlation')
    
    sns.violinplot(x='Class',y='V11',data=data_smote,palette=['lightsalmon','lightskyblue'],ax=ax2)
    ax2.set_title('V4 vs Class Positive Correlation')
    
    sns.violinplot(x='Class',y='V2',data=data_smote,palette=['lightsalmon','lightskyblue'],ax=ax3)
    ax3.set_title('V2 vs Class Positive Correlation')
    
    Text(0.5, 1.0, 'V2 vs Class Positive Correlation')
    

    在这里插入图片描述

    # 正相关的属性与Class分布图
    fig,((ax1,ax2,ax3),(ax4,ax5,ax6)) = plt.subplots(2,3,figsize=(24,14))
    sns.violinplot(x='Class',y='V14',data=data_smote,palette=['lightsalmon','lightskyblue'],ax=ax1)
    ax1.set_title('V14 vs Class negative Correlation')
    
    sns.violinplot(x='Class',y='V10',data=data_smote,palette=['lightsalmon','lightskyblue'],ax=ax2)
    ax2.set_title('V4 vs Class negative Correlation')
    
    sns.violinplot(x='Class',y='V12',data=data_smote,palette=['lightsalmon','lightskyblue'],ax=ax3)
    ax3.set_title('V12 vs Class negative Correlation')
    
    sns.violinplot(x='Class',y='V3',data=data_smote,palette=['lightsalmon','lightskyblue'],ax=ax4)
    ax4.set_title('V3 vs Class negative Correlation')
    
    sns.violinplot(x='Class',y='V7',data=data_smote,palette=['lightsalmon','lightskyblue'],ax=ax5)
    ax5.set_title('V7 vs Class negative Correlation')
    
    sns.violinplot(x='Class',y='V9',data=data_smote,palette=['lightsalmon','lightskyblue'],ax=ax6)
    ax6.set_title('V9 vs Class negative Correlation')
    
    
    Text(0.5, 1.0, 'V9 vs Class negative Correlation')
    

    在这里插入图片描述

    vif = [variance_inflation_factor(data_smote.values, data_smote.columns.get_loc(
        i)) for i in data_smote.columns]
    vif
    
    [8.91488639686892,
     41.95138589208644,
     29.439659166987383,
     9.321076032190051,
     18.073107065112527,
     7.88968653431388,
     38.13243240821064,
     2.61807436913295,
     4.202219415722627,
     20.898417802753006,
     5.976908659263689,
     10.856930462897152,
     1.2514060420970867,
     20.23958581764367,
     1.176425772463202,
     6.444784613229281,
     6.980222815257359,
     2.7742773520511372,
     2.4906782119059176,
     4.348667463801223,
     3.409678717638936,
     1.9626453781659197,
     2.1167419900555884,
     1.1352046295467655,
     1.9935984979230046,
     1.1029041559046275,
     3.084861887885401,
     1.9565486505075638,
     13.535498930988794,
     1.7451075607624895,
     4.64505815138509]
    

    相较于随机下采样来说,在多重共线性上相对有了改善

    # 利用Lasso进行特征选择
    #调用LassoCV函数,并进行交叉验证
    model_lasso = LassoCV(alphas=[0.1,0.01,0.005,1],random_state=1,cv=5).fit(X_smote,y_smote)
    #输出看模型中最终选择的特征
    coef = pd.Series(model_lasso.coef_,index=X_smote.columns)
    print(coef[coef != 0])
    
    V1    -0.019851
    V2     0.004587
    V3     0.000523
    V4     0.052236
    V5    -0.000597
    V6    -0.012474
    V7     0.030960
    V8    -0.012043
    V9     0.007895
    V10   -0.023509
    V11    0.005633
    V12    0.009648
    V13   -0.036565
    V14   -0.053919
    V15    0.012297
    V17   -0.009149
    V18    0.030941
    V20    0.010266
    V21    0.013880
    V22    0.019031
    V23   -0.009253
    V26   -0.068311
    V27   -0.003680
    V28    0.008911
    dtype: float64
    
    # 利用随机森林进行特征重要性排序
    from sklearn.ensemble import RandomForestClassifier
    rfc_fea_model = RandomForestClassifier(random_state=1)
    rfc_fea_model.fit(X_smote,y_smote)
    fea = X_smote.columns
    importance = rfc_fea_model.feature_importances_
    a = pd.DataFrame()
    a['feature'] = fea
    a['importance'] = importance
    a = a.sort_values('importance',ascending = False)
    plt.figure(figsize=(20,10))
    plt.bar(a['feature'],a['importance'])
    plt.title('the importance orders sorted by random forest')
    plt.show()
    

    在这里插入图片描述

    a.cumsum()
    
    featureimportance
    13V140.140330
    11V14V120.274028
    9V14V12V100.392515
    16V14V12V10V170.501863
    3V14V12V10V17V40.592110
    10V14V12V10V17V4V110.680300
    1V14V12V10V17V4V11V20.728448
    2V14V12V10V17V4V11V2V30.770334
    15V14V12V10V17V4V11V2V3V160.808083
    6V14V12V10V17V4V11V2V3V16V70.842341
    17V14V12V10V17V4V11V2V3V16V7V180.857924
    7V14V12V10V17V4V11V2V3V16V7V18V80.869777
    20V14V12V10V17V4V11V2V3V16V7V18V8V210.880431
    28V14V12V10V17V4V11V2V3V16V7V18V8V21Amount0.890861
    0V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV10.900571
    8V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V90.909978
    12V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V130.918623
    18V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V190.926978
    26V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V270.935072
    4V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V50.942688
    29V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V5Hour0.950275
    19V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V5HourV200.957287
    14V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V5HourV20V150.963873
    5V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V5HourV20V15V60.970225
    25V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V5HourV20V15V6V260.976519
    27V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V5HourV20V15V6V26V280.982202
    22V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V5HourV20V15V6V26V28V230.987770
    21V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V5HourV20V15V6V26V28V23V220.992335
    24V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V5HourV20V15V6V26V28V23V22V250.996295
    23V14V12V10V17V4V11V2V3V16V7V18V8V21AmountV1V9V13V19V27V5HourV20V15V6V26V28V23V22V25V241.000000

    从以上结果来看,两种方法的排序,特征重要性差别很大,为了更大程度的保留数据的信息,我们采用两种结合的特征,其中选择的标准是随机森林中重要性总和95%以上,如果其中有Lasso回归没有的,则加入,共选出除去V24,V25之外的28个特征

    # test_cols = ['V12', 'V14', 'V10', 'V17', 'V4', 'V11', 'V2', 'V7', 'V16', 'V3', 'V18',
    #              'V8', 'Amount', 'V19', 'V21', 'V1', 'V5', 'V13', 'V27','V6','V15','V26']
    test_cols = X_smote.columns.drop(['V24','V25'])
    Classifiers = {
        'LG': LogisticRegression(random_state=1),
        'KNN': KNeighborsClassifier(),
        'Bayes': GaussianNB(),
        'SVC': SVC(random_state=1, probability=True),
        'DecisionTree': DecisionTreeClassifier(random_state=1),
        'RandomForest': RandomForestClassifier(random_state=1),
        'Adaboost': AdaBoostClassifier(random_state=1),
        'GBDT': GradientBoostingClassifier(random_state=1),
        'XGboost': XGBClassifier(random_state=1),
        'LightGBM': LGBMClassifier(random_state=1)
    }
    Y_pred, Accuracy_score = train_test(
        Classifiers, X_smote[test_cols], y_smote, X_test[test_cols], y_test)
    print(Accuracy_score)
    Y_pred.head()
    
    LG
                   precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.06      0.86      0.12        98
    
        accuracy                           0.98     56961
       macro avg       0.53      0.92      0.55     56961
    weighted avg       1.00      0.98      0.99     56961
    
    KNN
                   precision    recall  f1-score   support
    
               0       1.00      1.00      1.00     56863
               1       0.29      0.83      0.42        98
    
        accuracy                           1.00     56961
       macro avg       0.64      0.91      0.71     56961
    weighted avg       1.00      1.00      1.00     56961
    
    Bayes
                   precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.06      0.84      0.11        98
    
        accuracy                           0.98     56961
       macro avg       0.53      0.91      0.55     56961
    weighted avg       1.00      0.98      0.99     56961
    
    SVC
                   precision    recall  f1-score   support
    
               0       1.00      0.99      0.99     56863
               1       0.09      0.86      0.17        98
    
        accuracy                           0.99     56961
       macro avg       0.55      0.92      0.58     56961
    weighted avg       1.00      0.99      0.99     56961
    
    DecisionTree
                   precision    recall  f1-score   support
    
               0       1.00      1.00      1.00     56863
               1       0.27      0.80      0.40        98
    
        accuracy                           1.00     56961
       macro avg       0.63      0.90      0.70     56961
    weighted avg       1.00      1.00      1.00     56961
    
    RandomForest
                   precision    recall  f1-score   support
    
               0       1.00      1.00      1.00     56863
               1       0.81      0.82      0.81        98
    
        accuracy                           1.00     56961
       macro avg       0.90      0.91      0.91     56961
    weighted avg       1.00      1.00      1.00     56961
    
    Adaboost
                   precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.06      0.89      0.12        98
    
        accuracy                           0.98     56961
       macro avg       0.53      0.93      0.55     56961
    weighted avg       1.00      0.98      0.99     56961
    
    GBDT
                   precision    recall  f1-score   support
    
               0       1.00      0.99      0.99     56863
               1       0.13      0.85      0.22        98
    
        accuracy                           0.99     56961
       macro avg       0.56      0.92      0.61     56961
    weighted avg       1.00      0.99      0.99     56961
    
    [15:24:00] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
    XGboost
                   precision    recall  f1-score   support
    
               0       1.00      1.00      1.00     56863
               1       0.69      0.84      0.76        98
    
        accuracy                           1.00     56961
       macro avg       0.84      0.92      0.88     56961
    weighted avg       1.00      1.00      1.00     56961
    
    LightGBM
                   precision    recall  f1-score   support
    
               0       1.00      1.00      1.00     56863
               1       0.54      0.83      0.66        98
    
        accuracy                           1.00     56961
       macro avg       0.77      0.91      0.83     56961
    weighted avg       1.00      1.00      1.00     56961
    
             LG       KNN     Bayes       SVC  DecisionTree  RandomForest  Adaboost    GBDT  XGboost  LightGBM
    0  0.977862  0.996138  0.976335  0.985113      0.995874       0.99935  0.977054  0.9898  0.99907  0.998508
    
    LGKNNBayesSVCDecisionTreeRandomForestAdaboostGBDTXGboostLightGBM
    01110011111
    11110001101
    21111111111
    31111111111
    41111111111

    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述

    从精确度上可以看出,经过了SMOTE上采样和随机下采样之后,精确度有了很好的提升,比如随机森林RandomForest达到了99.94%的精确度,在样本数量上分别是13个欺诈样本的没有识别,误将18个非欺诈样本识别为欺诈样本,其他的算法如XGBoost,lightGBM,KNN等都达到了99%的准确率,下面使用准确率高的模型进行集成
    因为样本量较大,而参数调优也比较耗时,目前的效果也比较好,因而省略网格调优的过程,时间足够的可以像前面一样调优。

    Classifiers = {'LG': reg_best(X_smote[test_cols], y_smote),
                  'KNN': KNN_best(X_smote[test_cols], y_smote),
                       'Bayes': GaussianNB(),
                       'SVC': SVC_best(X_smote[test_cols], y_smote),
                       'DecisionTree': DecisionTree_best(X_smote[test_cols], y_smote),
                       'RandomForest': RandomForest_best(X_smote[test_cols], y_smote),
                    'Adaboost':Adaboost_best(X_smote[test_cols], y_smote),
                       'GBDT': GBDT_best(X_smote[test_cols], y_smote),
                       'XGboost': XGboost_best(X_smote[test_cols], y_smote),
                       'LightGBM': LGBM_best(X_smote[test_cols], y_smote)}
    
    LogisticRegression(C=1)
    KNeighborsClassifier(n_neighbors=3)
    SVC(C=1, probability=True)
    DecisionTreeClassifier(max_depth=3, min_samples_leaf=5)
    RandomForestClassifier(criterion='entropy', min_samples_leaf=5)
    AdaBoostClassifier(learning_rate=1, n_estimators=200)
    GradientBoostingClassifier(n_estimators=150)
    XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
                  colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
                  importance_type='gain', interaction_constraints='',
                  learning_rate=0.5, max_delta_step=0, max_depth=5,
                  min_child_weight=1, missing=nan, monotone_constraints='()',
                  n_estimators=200, n_jobs=8, num_parallel_tree=1, random_state=0,
                  reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
                  tree_method='exact', validate_parameters=1, verbosity=None)
    LGBMClassifier(learning_rate=0.5, max_depth=15, num_leaves=51)
    
    Y_pred,Accuracy_score = train_test(
        Classifiers, X_smote[test_cols], y_smote, X_test[test_cols], y_test)
    print(Accuracy_score)
    Y_pred.head().append(Y_pred.tail())
    
    LG
                   precision    recall  f1-score   support
    
               0       1.00      0.97      0.99     56863
               1       0.06      0.91      0.11        98
    
        accuracy                           0.97     56961
       macro avg       0.53      0.94      0.55     56961
    weighted avg       1.00      0.97      0.99     56961
    
    KNN
                   precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.09      0.92      0.17        98
    
        accuracy                           0.98     56961
       macro avg       0.55      0.95      0.58     56961
    weighted avg       1.00      0.98      0.99     56961
    
    Bayes
                   precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.07      0.88      0.12        98
    
        accuracy                           0.98     56961
       macro avg       0.53      0.93      0.56     56961
    weighted avg       1.00      0.98      0.99     56961
    
    SVC
                   precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.08      0.90      0.15        98
    
        accuracy                           0.98     56961
       macro avg       0.54      0.94      0.57     56961
    weighted avg       1.00      0.98      0.99     56961
    
    DecisionTree
                   precision    recall  f1-score   support
    
               0       1.00      0.96      0.98     56863
               1       0.04      0.90      0.08        98
    
        accuracy                           0.96     56961
       macro avg       0.52      0.93      0.53     56961
    weighted avg       1.00      0.96      0.98     56961
    
    RandomForest
                   precision    recall  f1-score   support
    
               0       1.00      1.00      1.00     56863
               1       0.33      0.90      0.48        98
    
        accuracy                           1.00     56961
       macro avg       0.66      0.95      0.74     56961
    weighted avg       1.00      1.00      1.00     56961
    
    Adaboost
                   precision    recall  f1-score   support
    
               0       1.00      0.98      0.99     56863
               1       0.08      0.90      0.15        98
    
        accuracy                           0.98     56961
       macro avg       0.54      0.94      0.57     56961
    weighted avg       1.00      0.98      0.99     56961
    
    GBDT
                   precision    recall  f1-score   support
    
               0       1.00      0.99      0.99     56863
               1       0.11      0.89      0.19        98
    
        accuracy                           0.99     56961
       macro avg       0.55      0.94      0.59     56961
    weighted avg       1.00      0.99      0.99     56961
    
    [17:58:32] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
    XGboost
                   precision    recall  f1-score   support
    
               0       1.00      1.00      1.00     56863
               1       0.24      0.90      0.38        98
    
        accuracy                           0.99     56961
       macro avg       0.62      0.95      0.69     56961
    weighted avg       1.00      0.99      1.00     56961
    
    LightGBM
                   precision    recall  f1-score   support
    
               0       1.00      1.00      1.00     56863
               1       0.30      0.90      0.45        98
    
        accuracy                           1.00     56961
       macro avg       0.65      0.95      0.72     56961
    weighted avg       1.00      1.00      1.00     56961
    
            LG       KNN     Bayes       SVC  DecisionTree  RandomForest  Adaboost      GBDT   XGboost  LightGBM
    0  0.97393  0.984463  0.978477  0.982936      0.963273      0.996629  0.982093  0.987026  0.994944  0.996208
    
    LGKNNBayesSVCDecisionTreeRandomForestAdaboostGBDTXGboostLightGBM
    00000000000
    10000000000
    20000000000
    30000000000
    40010000000
    569560000000000
    569570000000000
    569580000000000
    569590010000000
    569600000000000
    # 集成学习
    # 根据以上auc以及recall的结果,选择LG,DT以及GBDT当作基模型
    from sklearn.ensemble import VotingClassifier
    voting_clf = VotingClassifier(estimators=[    
    #     ('LG', LogisticRegression(random_state=1)),
                       ('KNN', KNeighborsClassifier(n_neighbors=3)),
    #     ('Bayes',GaussianNB()),
    #                    ('SVC', SVC(random_state=1,probability=True)),
                       ('DecisionTree', DecisionTreeClassifier(random_state=1)),
                       ('RandomForest', RandomForestClassifier(random_state=1)),
    #                 ('Adaboost',AdaBoostClassifier(random_state=1)),
    #                    ('GBDT', GradientBoostingClassifier(random_state=1)),
                       ('XGboost', XGBClassifier(random_state=1)),
                       ('LightGBM', LGBMClassifier(random_state=1))
                                             ])
    voting_clf.fit(X_smote[test_cols], y_smote)
    y_final_pred = voting_clf.predict(X_test[test_cols])
    print(classification_report(y_test, y_final_pred))
    fig, ax = plt.subplots(1, 1)
    plot_confusion_matrix(voting_clf, X_test[test_cols], y_test, labels=[
                                  0, 1], cmap='Blues', ax=ax)
    
                  precision    recall  f1-score   support
    
               0       1.00      1.00      1.00     56863
               1       0.71      0.83      0.76        98
    
        accuracy                           1.00     56961
       macro avg       0.86      0.91      0.88     56961
    weighted avg       1.00      1.00      1.00     56961
    
    
    <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1fe3047f3a0>
    

    在这里插入图片描述

    最终,根据我们选择的模型,能达到几乎100%的准确率,欺诈样本中有17个没有识别出,而在非欺诈样本中只有33个被识别为存在信用卡欺诈,与随机下采样在准确率上有了极大提升。在欺诈样本的识别上还有提升空间!

    展开全文
  • 重离子碰撞中各向异性流的高次谐波(vn,n≥4)可以相... v4v5v6的实验数据与流体力学计算非常吻合。 我们认为v7可以相对于椭圆流和三角形流进行测量。 我们提出了大型强子对撞机中铅与铅碰撞中v7与中心性的预测。
  • 941n v4/v5编程器固件原厂+DD+UBNT大功率固件,全部固件下载
  • 研究n阶完全图Kn(n≥20,n≡0(mod2))去掉4条独立边后的点可区别边染色,并给 出了图Kn-{v1v2,v3v4,v5v6,v7v8}(n≥20,n≡0(mod2))的点可区别边色数。
  • 2000+个医疗心电样本。每个样本有8个导联,分别是I,II,V1,V2,V3,V4V5V6。 III=II-I aVR=-(I+II)/2 aVL=I-II/2 aVF=II-I/2 每个样本采样频率为500 HZ,长度为10秒,单位电压为4.88微伏(microvolts)。
  • 数据结构的图的遍历

    2012-05-12 08:45:51
    有向图 无项图的遍历,还有基于灵界矩阵的遍历
  • eset离线更新包V6V7V8

    2014-10-15 12:38:55
    eset离线更新包,电脑开机就更新,支持ESET V6,V7和V8数据包。
  • CATIA V6 导出V5说明

    2020-12-24 12:49:12
    As Spec About Copying As Result V4 Drawing Data to V6 When copying As Result V4 drawing data to V6, bear the following in mind:  Whatever the standard of the V4 view was...

    As Spec

    About Copying As Result V4 Drawing Data to V6

    When copying As Result V4 drawing data to V6, bear the following in mind:

    Whatever the standard of the V4 view was prior to being copied into CATIA

    Version 6,

    once in V6 its standard is that selected when you open the CATDrawing document.

    The smallest unit that you can copy is the view. All the elements that go to make up this

    view are included in the copy.

    In

    the

    V6

    Drafting

    mode

    Working

    Views

    (Edit

    >

    Working

    Views),

    the

    copy

    described

    above

    creates

    a

    V6

    view

    with

    the

    same

    name

    as

    in

    V4.

    In

    the

    V6

    Drafting

    mode

    Background (Edit > Background), the V4 elements are copied into the background view

    of the V4 view.

    The migration of V4 drawing data to a V6 document generates a report (.rpt) file named

    after the model migrated:

    o

    on Windows

    in

    C:\\Documents

    and

    Settings\\username\\Local

    Settings\\Application

    Data\\DassaultSystemes\\CATReport

    o

    on UNIX: in /u/users/username/CATReport

    This reports contains different types of information regarding the migration results:

    o

    location and name of the input and output files,

    o

    kind of migration performed (e.g. V4 to V6 CATDrawing),

    o

    mode of migration performed (AS SPEC or AS RESULT)

    o

    status of migration for each V4 element:

    - Correctly / OK: migration successful

    - KO: migration failed

    - NOT: migration could not be performed for lack of a CATIA V6 equivalent.

    o

    V4 attributes of V4 elements.

    If, when importing 2D geometry, you want to create circle centers and curve end points,

    select Tools > Options > General > Compatibility > V4/V6 DRAW tab. Select the Create

    centers and end points check box in the Geometry Import section.

    You can define dimension conversion as a graphic by selecting Tools > Options > General >

    Compatibility > V4/V6 DRAW tab. Select the Convert dimension as a graphic check box in

    the Dimension and annotation conversion mode section.

    The V4 structure of Detail/Ditto is migrated in a V6 structure of Detail (2D component

    reference) /Ditto (2D component instance).

    If

    you

    want

    to

    use

    the

    previous

    behavior

    (Explode

    Ditto)

    select

    Tools

    >

    Options

    to

    specify

    the

    展开全文
  • 第三题、判断题(每题1分,5道题共5分) 1、数组名的命名规则和变量名的命名规则相同 正确 错误 2、若有func((v1,v2),(v3,v4,v5),v6)调用,说明函数func有6个形参。 正确 错误 3、数组必须先定义后使用。 正确 错误...

    《C语言程序设计》第10章在线测试

    窗体顶端

    第一题、单项选择题(每题1分,5道题共5分)

    1、C语言中规定,函数的返回值的类型是由:

    A、调用该函数时系统临时决定的

    B、return语句中的表达式类型所决定

    C、调用该函数时的主调用函数类型所决定

    D、在定义该函数时所指定的函数类型所决定

    2、在定义int a[2][3];之后,下列使用正确的是:

    A、a(1,2)

    B、a[1,3]

    C、a[2][0]

    D、a[1>2][!1]

    3、一个C程序的执行是从( )。

    A、main()函数开始,在main()函数中结束

    B、第一个函数开始,直到最后一个函数结束

    C、第一个语句开始,直到最后一个语句结束

    D、main()函数开始,直到最后一个函数结束

    4、下列说法中,错误的说法是( )。

    A、程序可以从任何非主函数开始执行

    B、主函数可以调用任何非主函数的其他函数

    C、任何非主函数可以调用其他任何非主函数

    D、主函数可以分成两个部分:主函数说明部分和主函数体

    5、下列语句中,正确的语句定义是( )。

    A、int A(10);

    B、int A[10];

    C、int A[3,4];

    D、int A[3][];

    第二题、多项选择题(每题2分,5道题共10分)

    1、下列对C语言字符数组的描述中,正确的描述是:

    A、字符数组可以存放字符串

    B、字符数组中的字符串可以整体输入、输出

    C、不可以用关系运算符对字符数组中的字符串进行比较

    D、可以在赋值语句中通过赋值运算符“=”对字符数组整体赋值

    2、以下对C语言函数的有关描述中,不正确的描述是:

    A、C函数既可以嵌套定义又可以递归调用

    B、函数必须有返回值,否则不能使用函数

    C、C程序中有调用关系的所有函数必须放在同一个源程序文件中

    D、在C中,调用函数时,只能把实参的值传送给形参,形参的值不能传送给实参

    3、根据定义: struct person{char name[9];int age;}; struct person c[10]={"John",17,"Paul",19,"Mary",18,"Adam",16}; 能打印出字母M的语句是:C D

    A、printf("%s",c[0].name);

    B、printf("%s",c[1].name[0]);

    C、printf("%s",c[2].name[1]);

    D、printf("%s",c[3].name[2]);

    4、若形参是简单变量形式,则对应的实参可以是:

    A、常量

    B、数组名

    C、简单变量

    D、数组元素

    5、下列关于对二维数组a进行初始化的操作中,正确的写法是( )。

    A、int a[][3]={3,2,1,1,2,3};

    B、int a[][]={{3,2,1},{1,2,3}};

    C、int a[][3]={{3,2,1},{1,2,3}};

    D、int a[2][3]={{3,2,1},{1,2,3}};

    第三题、判断题(每题1分,5道题共5分)

    1、数组名的命名规则和变量名的命名规则相同

    正确

    错误

    2、若有func((v1,v2),(v3,v4,v5),v6)调用,说明函数func有6个形参。

    正确

    错误

    3、数组必须先定义后使用。

    正确

    错误

    4、在定义数组时的常量表达式中可以包含变量。

    正确

    错误

    5、对数组元素赋予的初值可以依次放在一对圆括号内。

    正确

    错误

    窗体底端

    展开阅读全文

    展开全文
  • 拓扑序列

    万次阅读 2018-08-20 13:13:23
    其中V={V1,V2,V3,V4V5V6,V7} ,E={ &lt; V1,V2&gt;,&lt; V1,V3&gt;, &lt; V1,V4 &gt; , &lt; V2,V5&gt;, &lt; V2,V6&gt;,&lt; V3,V5&gt;, &lt; V3,V6&...
  • wr430GV5版升级估计,需要升级3步才可以到V版,按照ABC的开头的软件逐一升级
  • IMX6ULL系列学习记录-kernel篇

    千次阅读 2019-11-11 20:10:44
    mx_v4_v5_defconfig imx_v6_v7_defconfig imx_v7_defconfig imx_v7_mfg_defconfig 2.3 编译 make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- 2.4 编译完之后文件 arch/arm/boot/zImage ...
  • 总览 DDI0100I_armv5_architecture_reference_manual 分为几个部分 cpu 架构 内存及系统架构 ... 实现体积小、性能高和非常低的功耗仍然是ARM体系结构开发中的关键属性。... 对基本RISC架构的这些增强使ARM...
  • 最小生成树的kruskal算法

    千次阅读 2011-12-06 09:33:09
    Kruskal算法思想 不同于Prim算法,Kruskal算法是一种按照连通网中边的权值的递增顺序构造最小生成树的算法。假设G=(V,E)是一个具有n个顶点的连通网,T=(U,TE)是G的最小生成树。 Kruskal算法的基本思想是 ...
  • 通俗易懂的YOLO系列(从V1到V5)模型解读! … 场景一:yolo v1 <map 63.4 , fps 45> yolo v1论文地址 YOLO v1论文详解 单阶段算法–YOLOv1详解 Yolo三部曲解读——Yolov1 1.1 物体检测 1.2 实现方法 1.3 ...
  • ARMv7 与 ARMv8的区别

    万次阅读 2019-04-11 11:31:03
    ARMv7 与 ARMv8的处理器架构自己一直没有详细了解过,现在来学习一下,在arm community 中文社区看到一个不错的总结。 两者之间的区别主要如下: ARMv8指令集分为Aarch64和Aarch32指令集,而ARMv7使用的是A32和T...
  • style="normal"> V1V2V3V4V5V6V7V8V9V10...V20V21V22V23V24V25V26V27V28Class V1V2V3V4V5V6V7V8V9V10...V20V21V22V23V24V25V26V27V28Class V1V2V3V4V5V6V7V8V9V10...V20V21V22V23V24V25V26V27V28Class V1V2V3V4V5V6...
  • data-size="normal" data-row-style="normal"> V1V2V3V4V5V6V7V8V9V10...V20V21V22V23V24V25V26V27V28Class V1V2V3V4V5V6V7V8V9V10...V20V21V22V23V24V25V26V27V28Class V1V2V3V4V5V6V7V8V9V10...V20V21V22V23V24V25...
  • tplink wr886n v5.0 ttl 接线方法

    千次阅读 2017-08-12 19:49:00
    我的倒是有ttl信息,但是全是乱码,换过RX和TX,也换过串口速率都没用,附上TTL接线图.==================================2016-11-02=========今天晚上终于搞定了ttl了,步骤如下:1.先将串口波特率改为117500(推荐使用...
  • 1.无向图/网:v1-v2-v3-v5-v4-v6-v7 2.有向图/网:v1-v2-v5-v3-v4-v6-v7 广度优先遍历序列(从 v2 顶点开始): 1.无向图/网:v2-v1-v3-v5-v4-v6-v7 2.有向图/网:v2-v5 后序无法遍历 注:有向图的...
  • C语言上机考试3.pdf

    2021-05-21 08:10:39
    题号: 4889以下函数调用语句中 , 实参的个数是 ().exce((v1,v2),(v3,v4,v5),v6);A、3B、4C、5D、6// (v1,v2) 是一个逗号表达式, (v3,v4,v5) 也是一个逗号表达式答案: A2. 题号: 4093关于 return 语句 , 下列...
  • 判断MIUI版本是V5还是V6

    千次阅读 2015-02-05 13:54:42
    public static String getMIUIProperty() { String line = null; BufferedReader reader = null; try { Process p = Runtime.getRuntime().exec("getprop ro.miui.ui.version.n
  • D3 v3到v5需要了解的变化

    千次阅读 2019-12-20 20:07:55
    v3 → v5 改动 js url <script src="https://d3js.org/d3.v3.js"></script> <script src="https://d3js.org/d3.v5.js"></script> style selection.style({ width:'100%', height:'30...
  • sojson.v5 解密方法

    千次阅读 2020-12-03 09:42:34
    参考: ... ... 其中func_name的取值可能会有问题,可以手动替换为解密函数名。(解密函数为sojson代码片段的第二部分,可以参考charmcode的文章) (function (js_body) { ... let js_arr = js_body.split("\n".
  • CATIA V6 二次开发—概述

    千次阅读 2021-01-17 21:47:28
    V4 V5 V6 版本其实在建模方面没有太大的升级,除了更改了界面设计之外,最主要的更新就是采用了基于ENOVIA的产品数据管理PDM和产品生产周期管理PLM的模式,强调模型的权限,对模型的评估,并且允许将其存储在中央...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 14,318
精华内容 5,727
关键字:

v4v5v6