精华内容
下载资源
问答
  • PCA iris数据降维后分类

    千次阅读 2020-07-23 23:41:56
    对数据进行标准环处理,选择一个维度可视化处理后效果 进行与元数据等维度PCA,查看各主成分的方差比例 保留合适的主成分,可视化降维的数据 基于降维数据建立KNN模型,与元数据表现进行对比 import pandas as pd ...
    1. 基于iris数据 建立KNN模型实现数据分类
    • 对数据进行标准环处理,选择一个维度可视化处理后效果
    • 进行与元数据等维度PCA,查看各主成分的方差比例
    • 保留合适的主成分,可视化降维后的数据
    • 基于降维后数据建立KNN模型,与元数据表现进行对比
    import pandas as pd
    import numpy as np
    data = pd.read_csv('iris_data.csv')
    print(data.head())
    X = data.drop(['target','label'],axis=1)
    y = data.loc[:,'label']
    print(X.shape,y.shape)#打印X,y的维度
    

    在这里插入图片描述

    #建立模型-预测-计算准确率
    from sklearn.neighbors import KNeighborsClassifier
    KNN = KNeighborsClassifier(n_neighbors=3)
    KNN.fit(X,y)
    y_predict = KNN.predict(X)
    from sklearn.metrics import accuracy_score
    accuracy = accuracy_score(y,y_predict)
    print(accuracy)
    

    0.96

    #数据标准化处理
    from sklearn.preprocessing import StandardScaler
    X_norm = StandardScaler().fit_transform(X)
    X1_mean = X.loc[:,'sepal length'].mean()
    X1_norm_mean = X_norm[:,0].mean()
    X1_sigma = X.loc[:,'sepal length'].std()
    X1_norm_sigma = X_norm[:,0].std()
    print(X1_mean,X1_norm_mean,X1_sigma,X1_norm_sigma)
    #均值约等于0和标准差为1
    

    在这里插入图片描述

    %matplotlib inline
    from matplotlib import pyplot as plt
    fig1 = plt.figure(figsize=(10,5))
    plt.subplot(121)
    plt.hist(X.loc[:,'sepal length'],bins=100)#X有四个维度 取一个
    plt.subplot(122)
    plt.hist(X_norm[:,0],bins=100)
    plt.show()
    

    在这里插入图片描述

    print(X.shape)
    #4维的
    
    from sklearn.decomposition import PCA 
    pca = PCA(n_components=4)
    X_pca = pca.fit_transform(X_norm)
    
    var_ratio = pca.explained_variance_ratio_#计算比例
    print(var_ratio)
    
    #主成分
    fig2 = plt.figure()
    plt.bar([1,2,3,4],var_ratio)
    plt.xticks([1,2,3,4],['PC1','PC2','PC3','PC4'])
    plt.ylabel("variance ratio ")
    plt.show()
    

    在这里插入图片描述

    from sklearn.decomposition import PCA 
    pca = PCA(n_components=2)
    X_pca = pca.fit_transform(X_norm)
    print(type(X_pca))
    
    fig3 = plt.figure()
    s1= plt.scatter(X_pca[:,0][y==0],X_pca[:,1][y==0])
    s2 = plt.scatter(X_pca[:,0][y==1],X_pca[:,1][y==1])
    s3 = plt.scatter(X_pca[:,0][y==2],X_pca[:,1][y==2])
    plt.legend((s1,s2,s3),('s1','s2','s3'))
    plt.show()
    

    在这里插入图片描述

    #准确率
    # from sklearn.neighbors import KNeighborsClassifier
    KNN = KNeighborsClassifier(n_neighbors=3)
    KNN.fit(X_pca,y)
    y_predict_pca = KNN.predict(X_pca)
    from sklearn.metrics import accuracy_score
    accuracy = accuracy_score(y,y_predict_pca)
    print(accuracy)
    

    在这里插入图片描述
    维度下降了 信息仍然保留了

    展开全文
  • PCA、LDASVM分类案例

    2020-11-19 18:06:12
    高光谱数据SVM分类sklearn.decomposition.PCA介绍 sklearn.decomposition.PCA介绍 函数定义: class sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False, svd_solver=‘auto’, tol=0.0, ...

    PCA降维

    都知道PCA是用来对数据进行降维的,就不多说明了,高光谱数据维度高,降维后可减少运算,要想下一步分类,往往都先做降维处理

    sklearn.decomposition.PCA介绍

    函数定义:

    class sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False, svd_solver=‘auto’, tol=0.0, iterated_power=‘auto’, random_state=None)
    

    ·n-components:int, float, None or str

    1. 若为大于1的整数(int)时,表示希望保留的降维后的维数;
    2. 若为取值为(0,1)的float型时,表示自动根据样本特征方差来决定降维到的维度数,这里n_components表示主成分的方差和所占的最小比例阈值。

    ·copy:copybool, default=True

    表示是否在运行算法时,将原始训练数据复制一份。若为True,则运行PCA算法后,原始训练数据的值不会有任何改变,因为是在原始数据的副本上进行运算;若为False,则运行PCA算法后,原始训练数据的值会改,因为是在原始数据上进行降维计算。

    PCA对象方法

    ·fit(X,y=None)
    fit()可以说是scikit-learn中通用的方法,每个需要训练的算法都会有fit()方法,它其实就是算法中的“训练”这一步骤。因为PCA是无监督学习算法,此处y自然等于None。
    fit(X),表示用数据X来训练PCA模型。
    函数返回值:调用fit方法的对象本身。比如pca.fit(X),表示用X对pca这个对象进行训练。
    ·explained_variance_
    代表降维后的各主成分的方差值。方差值越大,则说明越是重要的主成分。·explained_variance_ratio_
    代表降维后的各主成分的方差值占总方差值的比例,这个比例越大,则越是重要的主成分。

    导入库(后面使用到了的先全部导入了)

    import matplotlib.pyplot as plt
    import joblib
    from sklearn.model_selection import KFold
    from sklearn.model_selection import train_test_split
    import numpy as np
    from sklearn.svm import SVC
    from sklearn import metrics
    from sklearn import preprocessing
    from sklearn.decomposition import PCA
    import pandas as pd
    #from sklearn.grid_search import GridSearchCV
    from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
    from time import time
    from sklearn import preprocessing
    from scipy.io import loadmat
    import spectral
    from functools import reduce
    import sklearn
    from sklearn.preprocessing import StandardScaler
    

    PCA使用简单案例

    X = loadmat(r'E:\Indian_pines.mat')['indian_pines']
    print ('x.shape: ',X.shape)
    newX = np.reshape(X,(-1,X.shape[2]))
    print ('newX.shape: ',newX.shape)
    y = loadmat(r'E:\Indian_pines_gt.mat')['indian_pines_gt']
    #pca=PCA( )
    pca=PCA(n_components=0.9999)  # n_components可以写数字或小数,此处表示保留99.99%的原始信息
    pca.fit(newX,y)
    ratio=pca.explained_variance_ratio_
    print("pca.components_。shape: ",pca.components_.shape)
    print("pca_var_ratio.shape:",pca.explained_variance_ratio_.shape)
    #绘制图形
    for i in range(X.shape[2]):
        total = np.sum(ratio[0:i+1])
        if total >= 0.999:
            print('只需要前%d个主成分即可达到99.9%%'%i)
            break
    plt.plot([i for i in range(X.shape[2])],
             [np.sum(ratio[:i+1]) for i in range(X.shape[2])])
    plt.xticks(np.arange(X.shape[2],step=20))
    plt.yticks(np.arange(0,1.01,0.10))
    plt.grid()
    plt.show()
    

    PCA.fit() 结果:前n个波段包含的主成分信息

    前n个波段包含的主成分信息
    由图可得,仅使用前69个波段便可以包含99.9%的主成分信息

    SVM分类代码

    注释都写的很明确了

    # 获取mat格式的数据,loadmat输出的是dict,需要进行定位
    #原始数据
    #数据是字典类型,键值对都不一样,打印一下loadmat后的数据的键,根据实际修改以下三个键的值
    input_image = loadmat(r'E:\Indian_pines.mat')['indian_pines']
    #标签数据
    output_image = loadmat(r'E:\Indian_pines_gt.mat')['indian_pines_gt']
    #分类结果数据
    corrected_image = loadmat(r'E:\Indian_pines_corrected.mat')['indian_pines_corrected']
    print(len(list(input_image[3][6])))
    # plt.imshow(output_image,'gray')
    # plt.show()
    #看看打印的字典键
    #前面input和output要去掉取键里的值才能进行下面两句的键打印
    # print(input_image.keys())
    # print(output_image.keys())
    print('input size : ',input_image.shape)
    print('input type : ',type(input_image))
    print('output size: ',output_image.shape)
    label = np.unique(output_image)
    row,col = output_image.shape
    labelNum = {}
    for i in range(row):
        for j in range(col):
            if output_image[i][j] in labelNum:
                labelNum[output_image[i][j]] += 1
            else:
                labelNum[output_image[i][j]] = 0
    #print(labelNum)
    # total = lambda x,y:x+y,labelNum.values()
    # #view = spectral.imshow(corrected_image,classes = output_image,bands=(1,2,3))
    # ground_truth = spectral.imshow(input_image,classes = output_image.astype(int))
    # plt.show()
    need_label = np.zeros([output_image.shape[0],output_image.shape[1]])   
    for i in range(output_image.shape[0]):
        for j in range(output_image.shape[1]):
            if output_image[i][j] != 0:
                need_label[i][j] = output_image[i][j]
    # plt.imshow(need_label,'gray')
    # plt.show()
    new_datawithlabel_list = []
    for i in range(output_image.shape[0]):
        for j in range(output_image.shape[1]):
            if need_label[i][j] != 0:
                c2l = list(input_image[i][j])
                c2l.append(need_label[i][j])
                new_datawithlabel_list.append(c2l)
    
    new_datawithlabel_array = np.array(new_datawithlabel_list)
    print('new_datawithlabel_array 的shape为: ',new_datawithlabel_array.shape)
    #标准化
    
    data_D = preprocessing.StandardScaler().fit_transform(new_datawithlabel_array[:,:-1])  #(arry[:, :-1])按逆序展示,去掉了最后一列
    print('data_D.shape:',data_D.shape)
    #data_D = preprocessing.MinMaxScaler().fit_transform(new_datawithlabel_array[:,:-1])
    
    
    #data_L为label值的元组
    data_L = new_datawithlabel_array[:,-1]
    print(type(data_L.shape))
    
    # 将结果存档后续处理
    
    #new为标准化后加上label列表
    new = np.column_stack((data_D,data_L))
    new_ = pd.DataFrame(new)
    new_.to_csv('E:/Indian_pine.csv',header=False,index=False)
    # 生成csv文件后,就可以直接对该文件进行操作
    print('Done')
    
    #读取前面保存的csv文件数据进行使用
    data = pd.read_csv('E:/Indian_pine.csv',header=None)
    data = data.values
    #取最后一列以前的所有列
    data_D = data[:,:-1]
    #取最后一列
    data_L = data[:,-1]
    #将原始数据分成训练集和测试集两部分
    data_train, data_test, label_train, label_test = train_test_split(data_D,data_L,test_size=0.5)
    
    
    # 模型训练与拟合linear  rbf  poly
    t0 = time()
    #训练模型
    pca = PCA(n_components = 70, whiten=True).fit(data_D)
    #使用训练的模型进行姜维
    X_train_pca = pca.transform(data_train)
    X_test_pca = pca.transform(data_test)
    
    
    lda = LinearDiscriminantAnalysis(n_components = 14).fit(X_train_pca,label_train)
    X_train_ida = lda.transform(X_train_pca)
    X_test_ida = lda.transform(X_test_pca)
    
    print('OK')
    
    # param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],
    #               'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }
    # param_grid = {'C': [10, 20, 100, 500, 1e3],
    #               'gamma': [0.001, 0.005, 0.01, 0.05, 0.1, 0.125], }
    # clf = GridSearchCV(SVC(kernel='linear', class_weight='balanced'), param_grid)
    
    clf = SVC(kernel = 'rbf',gamma=0.1,C=20)   
    clf.fit(X_train_ida,label_train)            
    # clf.fit(data_train,label_train)
    
    #获得训练时的参数,即C,gamma,kernel,返回一个字典
    k = clf.get_params()
    print('clf.param: ', k)
    pred = clf.predict(X_test_ida)
    # print('pred.shape',label_test[0:10])
    
    accuracy = metrics.accuracy_score(label_test, pred)*100
    print('准确率为: ',accuracy)    # 95.60687234435618
    # print(clf.best_estimator_)
    print("done in %0.3fs" % (time() - t0))  # done in 13.843s
    # 存储结果学习模型,方便之后的调用
    joblib.dump(clf, "salinas_MODEL.m")
    
    #调用刚刚保存的模型
    #clf = joblib.load("salinas_MODEL.m")
    

    可以看到,在调用SVM分类前,使用了PCA和LDA两种降维,将PCA结构作为输入传入LDA,最后将LDA结果进行分类

    展开全文
  • #使用PCA(principal component analysis主成分分析法)减少系统的维数( 因为以上四个测量数据减少到三个,就可以使用3D散点图更好的描述) #PCA可以保留足以描述各数据点特征的信息,其中新生成的各维叫主成分. #...
    # coding: utf-8
    
    #使用PCA(principal component analysis主成分分析法)减少系统的维数( 因为以上四个测量数据减少到三个后,就可以使用3D散点图更好的描述)
    #PCA可以保留足以描述各数据点特征的信息,其中新生成的各维叫主成分.
    #scikit-learn库中的fit_transform()函数就是用来降维处理的. 
    #PCA对象简介 :  http://blog.csdn.net/u012102306/article/details/52294726
    import matplotlib.pyplot as plt
    #3D散点图
    from mpl_toolkits.mplot3d import Axes3D
    #数据集
    from sklearn import datasets
    #PCA对象
    from sklearn.decomposition import PCA
    
    #中文显示
    import matplotlib.font_manager as fm
    #mac中的字体问题请看: https://zhidao.baidu.com/question/161361596.html
    myfont = fm.FontProperties(fname='/Library/Fonts/Xingkai.ttc')
    #加载数据
    iris=datasets.load_iris()
    #取出花的种类
    species=iris.target
    #执行PCA降维处理
    x_reduced=PCA(n_components=3).fit_transform(iris.data)
    #绘制3D散点图
    fig=plt.figure()
    ax=Axes3D(fig)
    ax.set_title(u"PCA降维后的数据",size=14,fontproperties=myfont)
    ax.scatter(x_reduced[:,0],x_reduced[:,1],x_reduced[:,2],c=species)
    ax.set_xlabel(u"第一特征向量",fontproperties=myfont)
    ax.set_ylabel(u"第二特征向量",fontproperties=myfont)
    ax.set_zlabel(u"第三特征向量",fontproperties=myfont)
    ax.w_xaxis.set_ticklabels(())
    ax.w_yaxis.set_ticklabels(())
    ax.w_zaxis.set_ticklabels(())
    
    #将图表保存为png图片, 注意这句话必须在plot()之后,否则将得到一个空白图片
    plt.savefig('python_8_3_鸢尾花分类预测_PCA降维后显示.png')
    
    plt.show()
    

    展开全文
  • ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测 目录 基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测 设计思路 输出结果 核心代码 ...

    ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测

     

     

     

    目录

    基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测

    设计思路

    输出结果

    核心代码


     

     

     

    相关文章
    ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测
    ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测实现

     

    基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测

    设计思路

     

     

     

    输出结果

     

     

     

     

     

    (149, 5) 
        5.1  3.5  1.4  0.2  Iris-setosa
    0  4.9  3.0  1.4  0.2  Iris-setosa
    1  4.7  3.2  1.3  0.2  Iris-setosa
    2  4.6  3.1  1.5  0.2  Iris-setosa
    3  5.0  3.6  1.4  0.2  Iris-setosa
    4  5.4  3.9  1.7  0.4  Iris-setosa
    (149, 5) 
        Sepal_Length  Sepal_Width  Petal_Length  Petal_Width            type
    0           4.5          2.3           1.3          0.3     Iris-setosa
    1           6.3          2.5           5.0          1.9  Iris-virginica
    2           5.1          3.4           1.5          0.2     Iris-setosa
    3           6.3          3.3           6.0          2.5  Iris-virginica
    4           6.8          3.2           5.9          2.3  Iris-virginica
    切分点: 29
    label_classes: ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
    kNNDIY模型预测,基于原数据: 0.95
    kNN模型预测,基于原数据预测: [0.96666667 1.         0.93333333 1.         0.93103448]
    kNN模型预测,原数据PCA处理后: [1.         0.96       0.95918367]
    

     

     

     

    核心代码

    class KNeighborsClassifier Found at: sklearn.neighbors._classification
    
    class KNeighborsClassifier(NeighborsBase, KNeighborsMixin, 
        SupervisedIntegerMixin, ClassifierMixin):
        """Classifier implementing the k-nearest neighbors vote.
        
        Read more in the :ref:`User Guide <classification>`.
        
        Parameters
        ----------
        n_neighbors : int, default=5
        Number of neighbors to use by default for :meth:`kneighbors` queries.
        
        weights : {'uniform', 'distance'} or callable, default='uniform'
        weight function used in prediction.  Possible values:
        
        - 'uniform' : uniform weights.  All points in each neighborhood
        are weighted equally.
        - 'distance' : weight points by the inverse of their distance.
        in this case, closer neighbors of a query point will have a
        greater influence than neighbors which are further away.
        - [callable] : a user-defined function which accepts an
        array of distances, and returns an array of the same shape
        containing the weights.
        
        algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto'
        Algorithm used to compute the nearest neighbors:
        
        - 'ball_tree' will use :class:`BallTree`
        - 'kd_tree' will use :class:`KDTree`
        - 'brute' will use a brute-force search.
        - 'auto' will attempt to decide the most appropriate algorithm
        based on the values passed to :meth:`fit` method.
        
        Note: fitting on sparse input will override the setting of
        this parameter, using brute force.
        
        leaf_size : int, default=30
        Leaf size passed to BallTree or KDTree.  This can affect the
        speed of the construction and query, as well as the memory
        required to store the tree.  The optimal value depends on the
        nature of the problem.
        
        p : int, default=2
        Power parameter for the Minkowski metric. When p = 1, this is
        equivalent to using manhattan_distance (l1), and euclidean_distance
        (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
        
        metric : str or callable, default='minkowski'
        the distance metric to use for the tree.  The default metric is
        minkowski, and with p=2 is equivalent to the standard Euclidean
        metric. See the documentation of :class:`DistanceMetric` for a
        list of available metrics.
        If metric is "precomputed", X is assumed to be a distance matrix and
        must be square during fit. X may be a :term:`sparse graph`,
        in which case only "nonzero" elements may be considered neighbors.
        
        metric_params : dict, default=None
        Additional keyword arguments for the metric function.
        
        n_jobs : int, default=None
        The number of parallel jobs to run for neighbors search.
        ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
        ``-1`` means using all processors. See :term:`Glossary <n_jobs>`
        for more details.
        Doesn't affect :meth:`fit` method.
        
        Attributes
        ----------
        classes_ : array of shape (n_classes,)
        Class labels known to the classifier
        
        effective_metric_ : str or callble
        The distance metric used. It will be same as the `metric` parameter
        or a synonym of it, e.g. 'euclidean' if the `metric` parameter set to
        'minkowski' and `p` parameter set to 2.
        
        effective_metric_params_ : dict
        Additional keyword arguments for the metric function. For most 
         metrics
        will be same with `metric_params` parameter, but may also contain the
        `p` parameter value if the `effective_metric_` attribute is set to
        'minkowski'.
        
        outputs_2d_ : bool
        False when `y`'s shape is (n_samples, ) or (n_samples, 1) during fit
        otherwise True.
        
        Examples
        --------
        >>> X = [[0], [1], [2], [3]]
        >>> y = [0, 0, 1, 1]
        >>> from sklearn.neighbors import KNeighborsClassifier
        >>> neigh = KNeighborsClassifier(n_neighbors=3)
        >>> neigh.fit(X, y)
        KNeighborsClassifier(...)
        >>> print(neigh.predict([[1.1]]))
        [0]
        >>> print(neigh.predict_proba([[0.9]]))
        [[0.66666667 0.33333333]]
        
        See also
        --------
        RadiusNeighborsClassifier
        KNeighborsRegressor
        RadiusNeighborsRegressor
        NearestNeighbors
        
        Notes
        -----
        See :ref:`Nearest Neighbors <neighbors>` in the online 
         documentation
        for a discussion of the choice of ``algorithm`` and ``leaf_size``.
        
        .. warning::
        
        Regarding the Nearest Neighbors algorithms, if it is found that two
        neighbors, neighbor `k+1` and `k`, have identical distances
        but different labels, the results will depend on the ordering of the
        training data.
        
        https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
        """
        @_deprecate_positional_args
        def __init__(self, n_neighbors=5, 
            *, weights='uniform', algorithm='auto', leaf_size=30, 
            p=2, metric='minkowski', metric_params=None, n_jobs=None, **
            kwargs):
            super().__init__(n_neighbors=n_neighbors, algorithm=algorithm, 
             leaf_size=leaf_size, metric=metric, p=p, metric_params=metric_params, 
             n_jobs=n_jobs, **kwargs)
            self.weights = _check_weights(weights)
        
        def predict(self, X):
            """Predict the class labels for the provided data.
    
            Parameters
            ----------
            X : array-like of shape (n_queries, n_features), \
                    or (n_queries, n_indexed) if metric == 'precomputed'
                Test samples.
    
            Returns
            -------
            y : ndarray of shape (n_queries,) or (n_queries, n_outputs)
                Class labels for each data sample.
            """
            X = check_array(X, accept_sparse='csr')
            neigh_dist, neigh_ind = self.kneighbors(X)
            classes_ = self.classes_
            _y = self._y
            if not self.outputs_2d_:
                _y = self._y.reshape((-1, 1))
                classes_ = [self.classes_]
            n_outputs = len(classes_)
            n_queries = _num_samples(X)
            weights = _get_weights(neigh_dist, self.weights)
            y_pred = np.empty((n_queries, n_outputs), dtype=classes_[0].
             dtype)
            for k, classes_k in enumerate(classes_):
                if weights is None:
                    mode, _ = stats.mode(_y[neigh_indk], axis=1)
                else:
                    mode, _ = weighted_mode(_y[neigh_indk], weights, axis=1)
                mode = np.asarray(mode.ravel(), dtype=np.intp)
                y_pred[:k] = classes_k.take(mode)
            
            if not self.outputs_2d_:
                y_pred = y_pred.ravel()
            return y_pred
        
        def predict_proba(self, X):
            """Return probability estimates for the test data X.
    
            Parameters
            ----------
            X : array-like of shape (n_queries, n_features), \
                    or (n_queries, n_indexed) if metric == 'precomputed'
                Test samples.
    
            Returns
            -------
            p : ndarray of shape (n_queries, n_classes), or a list of n_outputs
                of such arrays if n_outputs > 1.
                The class probabilities of the input samples. Classes are ordered
                by lexicographic order.
            """
            X = check_array(X, accept_sparse='csr')
            neigh_dist, neigh_ind = self.kneighbors(X)
            classes_ = self.classes_
            _y = self._y
            if not self.outputs_2d_:
                _y = self._y.reshape((-1, 1))
                classes_ = [self.classes_]
            n_queries = _num_samples(X)
            weights = _get_weights(neigh_dist, self.weights)
            if weights is None:
                weights = np.ones_like(neigh_ind)
            all_rows = np.arange(X.shape[0])
            probabilities = []
            for k, classes_k in enumerate(classes_):
                pred_labels = _y[:k][neigh_ind]
                proba_k = np.zeros((n_queries, classes_k.size))
                # a simple ':' index doesn't work right
                for i, idx in enumerate(pred_labels.T): # loop is O(n_neighbors)
                    proba_k[all_rowsidx] += weights[:i]
                
                # normalize 'votes' into real [0,1] probabilities
                normalizer = proba_k.sum(axis=1)[:np.newaxis]
                normalizer[normalizer == 0.0] = 1.0
                proba_k /= normalizer
                probabilities.append(proba_k)
            
            if not self.outputs_2d_:
                probabilities = probabilities[0]
            return probabilities

     

     

     

     

     

     

     

     

     

     

    展开全文
  • 降维是将高维度的数据保留下最重要的一些特征,去除噪声和不重要的特征,从而实现提升数据处理速度的目的 PCA(Principal Component Analysis),即主成分分析方法,是一种使用最广泛的数据降维算法。PCA的主要思想是...
  • pca降维分类 什么是PCA? (What is PCA?) Principal Component Analysis (PCA) is a common feature extraction technique in data science that employs matrix factorization to reduce the dimensionality of ...
  • ML之kNNC:基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测 目录 基于iris莺尾花数据集(PCA处理+三维散点图可视化)利用kNN算法实现分类预测 设计思路 输出结果 核心代码 ...
  • KNN分类 #对降维的mnist进行KNN分类 from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=3) knn.fit(train_data_pca, train_labels) #计算测试得分 knn.score(test_...
  • 通过对麻省理工心率失常数据库中8类心搏心电数据分别运用支持向量机以及PCA-SVM模式分类方法进行分析处理,比较最终的分类准确率。发现当支持向量机选择线性核函数时,SVM的分类准确率为97.812 5%,PCA-SVM的分类...
  • pca+lda做特征降维snn分类器对数据进行分类,协同神经网络做模式识别,pca对整体原始数据进行降维处理,然后构造随机子空间分类器snn分类,效果不错
  • 训练数据集在使用PCA进行数据降维,用基本分类器进行训练得到一个分类模型,那线上预测真实数据应该怎么办?应该不能直接放入训练的分类模型中去吧? 答:当然不能,要用你从训练数据里面得到的那个降维矩阵对...
  • ML之DR之PCA:利用PCA对手写数字图片识别数据集进行降维处理(理解PCA) 目录 初步理解PCA 输出结果 核心代码 初步理解PCA #理解PCA:线性相关矩阵秩计算样例 import numpy as np M = np....
  • PCA

    万次阅读 2016-11-08 09:45:59
    PCA:主成分分析,一种常用的数据分析方法,不管是在机器学习还是数据挖掘中都会用到。PCA主要通过求数据集的协方差矩阵最大的特征值对应的特征向量,由此找到数据方差最大的几个方向,对数据达到降维的效果,将一个...
  • pca做特征降维,然后进行特征空间随机分割构造多个svm分类器,并行处理,对样本进行分类,基于特征空间的svm多分类
  • 无监督学习PCA降维处理和K-means聚类

    千次阅读 2020-02-26 18:37:06
    无监督学习: 没有目标值(变量)的算法。 常见的无监督学习算法: 1、降维: – 主成分分析PCA降维处理。 2、聚类: – K-means(k均值聚类)。
  • 目的高光谱数据具有较高的谱间分辨率和相关性,给分类处理带来了一定的困难。为了提高分类精度,提出一种结合PCA与移动窗小波变换的高光谱决策融合分类算法。方法首先,利用相关系数矩阵对原始高光谱数据进行波段分组;...
  • 机器学习PCA和放射学X射线图像上的贝叶斯分类 Python中的X射线图像处理分类(从头开始)主成分分析(用于降维和特征提取)贝叶斯分类器(多元高斯)直方图分类器。 要做的步骤:从Kaggle下载数据。 将图像调整为...
  • python图像处理PCA主成分分析

    千次阅读 2020-10-02 10:35:52
    介绍一个PCA的教程:A tutorial on Principal Components Analysis ——Lindsay I Smith 1.协方差 Covariance 变量X和变量Y的协方差公式如下,协方差是描述不同变量之间的相关关系,协方差>0时说明 X和 Y是正相关...
  • 提取PCA变换的图像的各主成分(matlab代码)

    万次阅读 多人点赞 2018-04-01 10:18:58
    该篇文章主要是使用PCA对一幅图像进行处理,并提取和显示该图像经过PCA的变换的各主成分。 这里就直接贴代码,PCA原理的部分相信大家都看了许多,其实看来看去,不去实践一个具体的目的,总还有些概念不是很清楚...
  • 降维的作用 ①数据在低维下更容易处理、更容易使用; ②相关特征,特别是重要特征更能在数据中明确...最后通过一个数据集的例子来展示和掌握PCA的工作过程,经过PCA处理后,该数据集就从590个特征降低到了6个特征。...
  • PCA(Principal Components Analysis)主成分分析,应用于点云预处理,平面检测,法向量求解,降维、分类,解压(升维),用PCA对点云中的点分类,地面点,墙面点,物体上的点等,然后再做其他处理PCA是将三维...
  • 高光谱图像分类是高光谱遥感对地观测技术的一项重要内容,在军事及民用领域都有着重要的应用。然而,高光谱图像的高维特性、波段间高度相关性、光谱混合等使高光谱图像分类面临巨大挑战。一方面高光谱图像相邻波段...
  • 并提出一种自动加权策略,对卷积处理后得到的K个特征图像进行加权叠加操作;最后对特征图像进行分块直方图统计稀疏化,并应用稀疏表示分类算法进行分类。在公共人脸数据集AR、CMU Multi-PIE、ORL以及数字手写体数据...
  • 数字图像处理,觉得挺实用的而写这些东西。  首先需要我们提取人脸库,如一个公司,一所学校,把每个人的头像切下来。imcrop一下,截取成一个个小方框,就如南大标准脸那样的照片。 (:)是将矩阵
  • PCA代码

    千次阅读 2019-03-18 21:34:40
    PCA算法的MATLAB程序函数调用对训练数据的标准化处理(均值为0,方差为1)测试数据的标准化处理协方差矩阵的特征值分解确定主元个数得分空间与残差空间计算训练数据的T(i)和SPE(i)计算控制限T2和控制限SPE计算测试...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 24,068
精华内容 9,627
关键字:

pca处理后分类