• ## Classification

千次阅读 2018-09-14 15:19:53
CLASSIFICATION 这篇文章主要介绍了数据挖掘中的四种分类算法，前三个是决策树，然后是KNN，最后一个是朴素贝叶斯。 日常生活中分类过程随处可见，比如：医生对病人诊断时就是一个典型的分类过程，任何一个医生...
CLASSIFICATION

这篇文章主要介绍了数据挖掘中的四种分类算法，前三个是决策树，然后是KNN，最后一个是朴素贝叶斯。

日常生活中分类过程随处可见，比如：医生对病人诊断时就是一个典型的分类过程，任何一个医生都无法直接看到病人的病情，只能观察病人表现出的症状和各种化验的监测数据来推断病情，这时医生就好比一个分类器，而这个医生诊断的准确率，与他当初受到的教育方式（构造方法）、病人的症状是否突出（待分类数据的特性）以及医生的经验多少（训练样本数量）都有密切关系。

ID3决策树

信息熵越小，样本集合的纯度越高。有了信息熵，当我们选择用样本的某个属性a来划分样本集合D时，就可以得出用属性a对样本D进行划分所带来的“信息增益”。ID3决策树选择信息增益大的属性来进行划分。（计算公式可参考[1]）

C4.5决策树

ID3决策树存在一个缺点：当一个属性的可取值数目较多时，那么可能在这个属性对应的可取值下的样本只有一个或者是很少个，那么这个术后它的信息增益是非常高的，这个时候ID3决策树会认为这个属性很适合划分，但是选择较多取值属性来进行划分带来的问题是它的泛化能力比较弱，不能够对新样本进行有效的预测。

为此提出了增益率这一概念，这个增益率对可取值数目较少的属性有多偏好，因此C4.5决策树先从候选划分属性中找出信息增益高于平均水平的属性，然后再从中选择增益率最高的属性进行划分。（计算公式可参考[1]）

CART决策树（Classification and Regression Tree）

该决策树采用基尼系数来划分属性，即在候选属性中选择基尼系数最小的属性进行划分。该决策树可解决分类回归问题。
和前两种决策树不一样，CART最后形成的是一个二叉树，于是这就要求CART在所选定的属性中又要划分出最佳的属性划分值，即：节点如果选定了划分属性，还要确定该节点内部按照哪个值做二元划分。

CART算法在把数据进行分类之后，会对树进行一个剪枝。具体来说就是：对于一个生长完全的决策树T0，其中的每个节点t，节点t的下面有若干子节点Tt，做剪枝就是要考虑一下Tt有没有存在的必要。具体做法：通过计算每个非叶子节点的误差增益率（可以理解为误差代价），最后选出误差代价最小的节点进行剪枝。（计算公式可参考[3]）

KNN(K Nearest Neighbor)

KNN是通过测量不同特征值之间的距离进行分类。它的思路是：如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别，则该样本也属于这个类别，其中K通常是不大于20的整数。KNN算法中，所选择的邻居都是已经正确分类的对象。该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。KNN算法的结果很大程度取决于K的选择。

算法的描述为：
1）计算测试数据与各个训练数据之间的距离；
2）按照距离的递增关系进行排序；
3）选取距离最小的K个点；
4）确定前K个点所在类别的出现频率；
5）返回前K个点中出现频率最高的类别作为测试数据的预测分类。

knn算法的缺点是计算量大，这个从程序中也应该看得出来，里面每个测试数据都要计算到所有的训练集数据之间的欧式距离，时间复杂度就已经为O(n*n)，如果真实数据的n非常大，这个算法的开销的确态度，所以KNN不适合大规模数据量的分类。

朴素贝叶斯(Naive Bayes)

贝叶斯分类算法是统计分类算法的一种，他是一类利用概率统计知识进行的一种分类算法。而朴素贝叶斯算法就是里面贝叶斯算法中最简单的一个算法。为什么叫做朴素贝叶斯，因为他里面的各个类条件是独立的，所以一会在后面的计算中会起到很多方便的作用。

朴素贝叶斯的思想基础是这样的：对于给出的待分类项，求解在此项出现的条件下各个类别出现的概率，哪个最大，就认为此待分类项属于哪个类别。通俗来说，就好比这么个道理，你在街上看到一个黑人，我问你你猜这哥们哪里来的，你十有八九猜非洲。为什么呢？因为黑人中非洲人的比率最高，当然人家也可能是美洲人或亚洲人，但在没有其它可用信息下，我们会选择条件概率最大的类别，这就是朴素贝叶斯的思想基础。

该算法用到的一个核心概率公式: P(B|A) = (P(A|B)P(B))/P(A) ，从这个公式可以看到贝叶斯的巨大作用就是对因果关系进行了交换。

分类流程如下：
- 准备阶段：确定特征属性，或缺训练样本。
- 分类器训练阶段：对每个类别计算P(yi)，对每个特征属性计算所有划分的条件概率；
- 应用阶段：对每个类别计算P(x|yi)p(yi)，以P(x|yi)p(yi)最大项作为x所属类别。

REFERENCES

[1] https://blog.csdn.net/qq_27717921/article/details/74784400
[2] https://www.cnblogs.com/ybjourney/p/4702562.html
[3] https://blog.csdn.net/androidlushangderen/article/details/42558235
[4] https://blog.csdn.net/androidlushangderen/article/details/42613011
[5] https://blog.csdn.net/androidlushangderen/article/details/42680161
[6] http://www.cnblogs.com/leoo2sk/archive/2010/09/17/naive-bayesian-classifier.html
展开全文
• binaryClassification
• Pattern Classification (Second Edition). Machine Learning
• Such instantiations are called <em>classification graphs</em>. The tool recognizes all ontologies <em>O</em> that are follow the ontology statement <em>O IS_A classification_root</em>. When a new ...
• <p>Not-yet-approved <a href="https://cgal.geometryfactory.com/CGAL/Members/wiki/Features/Small_Features/Classification_2.0">small feature</a>: - support of mesh classification - support of cluster ...
• <div><p>This commit is intended to add a classification bar to each tool and telemetry screen, useful for when you want to mark COSMOS for a certain classification level. Can be extended beyond ...
• <p>We need to make the classification for landcover exposure match with the volcanic ash landcover classification. The landcover classification came from Badan Geologi. We might also need to ...
• - bindings for Classification of point sets (mesh could be integrated later but it'd be nice to have Surface Mesh before that), including example in both Python and Java - TBB support for PSP and ...
• <div><p>I am running the Object Classification workflow using segmented images from the Pixel Classification workflow. I've encountered a strange behavior that occurs when I am using the Brush ...
• Hierarchical Classification 层次分类
Hierarchical Classification
层次分类


展开全文
• Document classification Document classification - Wikipedia, the free encyclopediaDocument classification From Wikipedia, the free en...


Document classification

posted on
2013-01-24 15:36 lexus 阅读(...) 评论(...)  编辑 收藏

转载于:https://www.cnblogs.com/lexus/archive/2013/01/24/2875108.html
展开全文
• <div><p>Does bonnetal classification module support multilabel classification? Thanks a lot! Jack</p><p>该提问来源于开源项目：PRBonn/bonnetal</p></div>
• In the sentence classification task, context formed from sentences adjacent to the sentence being classified can provide important information for classification. This context is, however, often ...
• I believe image classification is a great start point before diving into other computer vision fields, espacially for begginers who know nothing about deep learning. When I started to learn...
Background
I believe image classification is a great start point before diving into other computer vision fields, espacially
for begginers who know nothing about deep learning. When I started to learn computer vision, I’ve made a lot of mistakes, I wish someone could have told me that which paper I should start with back then. There doesn’t seem to have a repository to have a list of image classification papers like deep_learning_object_detection until now. Therefore, I decided to make a repository
of a list of deep learning image classification papers and codes to help others. My personal advice for people who
know nothing about deep learning, try to start with vgg, then googlenet, resnet, feel free to continue reading other listed papers or switch to other fields after you are finished.
Note: I also have a repository of pytorch implementation of some of the image classification networks, you can check out here.


展开全文
• Classification, to find out which bounder side of a point or get the bounder to separate the dataset. This article is mainly about Linear Classification, using one hyper plane to separate the dat...
• Textclassification 中文短文本分类 包含TextCNN, TextDCNN, TextDPCNN, TextRCNN, TextRNN, TextRNN+Attention, Transformer, FastText等模型
• <p>This would be a product of source.network_classification and destination.network_classification <p>eg <pre><code> source.network_classification: trusted destination.network_classification: ...
• Currently the evaluation class only supports single label classification, even though SS3 inherently supports multilabel classification. These are the steps (I see) needed to support multilabel ...
• Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
• Text Classification For purpose of word embedding extrinsic evaluation, especially downstream task. Some concepts are informed from 复旦大学NLP组 Statistical-Based Method Statistics perspective based ...
• <p>This PR adds some features to the classification plug-in: - much better reworked widget - possibility to compute classification on clusters (from RANSAC or region growing) <p>Some undocumented code...
• <div><ul><li>Code to support book multiple ml methods in the envelope class.</li><li>A new class Classification to perform two class classification in the new architecture of TMVA.</li></ul>该提问来源...
• As a branch of classification, associative classification combines the basic ideas of association rule mining and general classification. Previous studies show that associative classification can ...
• <p>I tried training a feedforward network with the output layer being classification data to all my training instances, but the output it generated doesn't seem right. I am hoping there is a ...
• link address : https://en.wikipedia.org/wiki/Statistical_classification&gt;&gt;For the unsupervised learning approach, see Cluster ...In machine learning and statistics, classification is...
• * Added class TMVA::Classification to perform two class Classification * Support to Train/Test multiple booked ml methods in parallel with MultiProc, calling the method Evaluate * Documentation with ...
• classification 物体识别分类 项目介绍 该项目对物体进行识别分类。 项目配置 作者开发环境： Python 3.7 PyTorch >= 1.5.1 数据集 采用"Stanford Dogs Dataset"数据集官方地址：...
• There are some examples of using PyTorch for image classification Usage Each file of this project is an example of image classification, you can learn from level1 to levelN. For more explaination of ...

...