精华内容
下载资源
问答
  • Classification

    千次阅读 2018-09-14 15:19:53
    CLASSIFICATION 这篇文章主要介绍了数据挖掘中的四种分类算法,前三个是决策树,然后是KNN,最后一个是朴素贝叶斯。 日常生活中分类过程随处可见,比如:医生对病人诊断时就是一个典型的分类过程,任何一个医生...

    CLASSIFICATION

    这篇文章主要介绍了数据挖掘中的四种分类算法,前三个是决策树,然后是KNN,最后一个是朴素贝叶斯。

    日常生活中分类过程随处可见,比如:医生对病人诊断时就是一个典型的分类过程,任何一个医生都无法直接看到病人的病情,只能观察病人表现出的症状和各种化验的监测数据来推断病情,这时医生就好比一个分类器,而这个医生诊断的准确率,与他当初受到的教育方式(构造方法)、病人的症状是否突出(待分类数据的特性)以及医生的经验多少(训练样本数量)都有密切关系。

    ID3决策树

    信息熵越小,样本集合的纯度越高。有了信息熵,当我们选择用样本的某个属性a来划分样本集合D时,就可以得出用属性a对样本D进行划分所带来的“信息增益”。ID3决策树选择信息增益大的属性来进行划分。(计算公式可参考[1])

    C4.5决策树

    ID3决策树存在一个缺点:当一个属性的可取值数目较多时,那么可能在这个属性对应的可取值下的样本只有一个或者是很少个,那么这个术后它的信息增益是非常高的,这个时候ID3决策树会认为这个属性很适合划分,但是选择较多取值属性来进行划分带来的问题是它的泛化能力比较弱,不能够对新样本进行有效的预测。

    为此提出了增益率这一概念,这个增益率对可取值数目较少的属性有多偏好,因此C4.5决策树先从候选划分属性中找出信息增益高于平均水平的属性,然后再从中选择增益率最高的属性进行划分。(计算公式可参考[1])

    CART决策树(Classification and Regression Tree)

    该决策树采用基尼系数来划分属性,即在候选属性中选择基尼系数最小的属性进行划分。该决策树可解决分类回归问题。
    和前两种决策树不一样,CART最后形成的是一个二叉树,于是这就要求CART在所选定的属性中又要划分出最佳的属性划分值,即:节点如果选定了划分属性,还要确定该节点内部按照哪个值做二元划分。

    CART算法在把数据进行分类之后,会对树进行一个剪枝。具体来说就是:对于一个生长完全的决策树T0,其中的每个节点t,节点t的下面有若干子节点Tt,做剪枝就是要考虑一下Tt有没有存在的必要。具体做法:通过计算每个非叶子节点的误差增益率(可以理解为误差代价),最后选出误差代价最小的节点进行剪枝。(计算公式可参考[3])

    KNN(K Nearest Neighbor)

    KNN是通过测量不同特征值之间的距离进行分类。它的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别,其中K通常是不大于20的整数。KNN算法中,所选择的邻居都是已经正确分类的对象。该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。KNN算法的结果很大程度取决于K的选择

    算法的描述为:
    1)计算测试数据与各个训练数据之间的距离;
    2)按照距离的递增关系进行排序;
    3)选取距离最小的K个点;
    4)确定前K个点所在类别的出现频率;
    5)返回前K个点中出现频率最高的类别作为测试数据的预测分类。

    knn算法的缺点是计算量大,这个从程序中也应该看得出来,里面每个测试数据都要计算到所有的训练集数据之间的欧式距离,时间复杂度就已经为O(n*n),如果真实数据的n非常大,这个算法的开销的确态度,所以KNN不适合大规模数据量的分类。

    朴素贝叶斯(Naive Bayes)

    贝叶斯分类算法是统计分类算法的一种,他是一类利用概率统计知识进行的一种分类算法。而朴素贝叶斯算法就是里面贝叶斯算法中最简单的一个算法。为什么叫做朴素贝叶斯,因为他里面的各个类条件是独立的,所以一会在后面的计算中会起到很多方便的作用。

    朴素贝叶斯的思想基础是这样的:对于给出的待分类项,求解在此项出现的条件下各个类别出现的概率,哪个最大,就认为此待分类项属于哪个类别。通俗来说,就好比这么个道理,你在街上看到一个黑人,我问你你猜这哥们哪里来的,你十有八九猜非洲。为什么呢?因为黑人中非洲人的比率最高,当然人家也可能是美洲人或亚洲人,但在没有其它可用信息下,我们会选择条件概率最大的类别,这就是朴素贝叶斯的思想基础。

    该算法用到的一个核心概率公式: P(B|A) = (P(A|B)P(B))/P(A) ,从这个公式可以看到贝叶斯的巨大作用就是对因果关系进行了交换。

    分类流程如下:
    - 准备阶段:确定特征属性,或缺训练样本。
    - 分类器训练阶段:对每个类别计算P(yi),对每个特征属性计算所有划分的条件概率;
    - 应用阶段:对每个类别计算P(x|yi)p(yi),以P(x|yi)p(yi)最大项作为x所属类别。

    REFERENCES

    [1] https://blog.csdn.net/qq_27717921/article/details/74784400
    [2] https://www.cnblogs.com/ybjourney/p/4702562.html
    [3] https://blog.csdn.net/androidlushangderen/article/details/42558235
    [4] https://blog.csdn.net/androidlushangderen/article/details/42613011
    [5] https://blog.csdn.net/androidlushangderen/article/details/42680161
    [6] http://www.cnblogs.com/leoo2sk/archive/2010/09/17/naive-bayesian-classifier.html

    展开全文
  • binaryClassification

    2017-06-25 09:53:57
    binaryClassification
  • Pattern Classification

    2019-02-08 09:37:38
    Pattern Classification (Second Edition). Machine Learning
  • Classification editor

    2020-12-09 15:05:27
    Such instantiations are called <em>classification graphs</em>. The tool recognizes all ontologies <em>O</em> that are follow the ontology statement <em>O IS_A classification_root</em>. When a new ...
  • Classification 2.0

    2020-12-03 00:48:17
    <p>Not-yet-approved <a href="https://cgal.geometryfactory.com/CGAL/Members/wiki/Features/Small_Features/Classification_2.0">small feature</a>: - support of mesh classification - support of cluster ...
  • Classification Bar

    2020-12-01 20:23:11
    <div><p>This commit is intended to add a classification bar to each tool and telemetry screen, useful for when you want to mark COSMOS for a certain classification level. Can be extended beyond ...
  • Landcover Classification

    2020-12-01 19:17:16
    <p>We need to make the classification for landcover exposure match with the volcanic ash landcover classification. The landcover classification came from Badan Geologi. We might also need to ...
  • Classification bindings

    2020-12-09 05:28:26
    - bindings for Classification of point sets (mesh could be integrated later but it'd be nice to have Surface Mesh before that), including example in both Python and Java - TBB support for PSP and ...
  • <div><p>I am running the Object Classification workflow using segmented images from the Pixel Classification workflow. I've encountered a strange behavior that occurs when I am using the Brush ...
  • Hierarchical Classification

    2020-09-10 20:30:01
    Hierarchical Classification 层次分类

    Hierarchical Classification
    层次分类

    展开全文
  • Document classification

    2019-10-06 00:17:28
    Document classification Document classification - Wikipedia, the free encyclopediaDocument classification From Wikipedia, the free en...

    Document classification - Wikipedia, the free encyclopedia

    Document classification

    From Wikipedia, the free encyclopedia
    Jump to: navigation, search

    Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is used mainly in information science and computer science. The problems are overlapping, however, and there is therefore also interdisciplinary research on document classification.

    The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. When not otherwise specified, text classification is implied.

    Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: The content based approach and the request based approach.

    Contents

     [hide

    [edit] "Content based" versus "request based" classification

    Content based classification is classification in which the weight given to particular subjects in a document determines the class to which the document is assigned. It is, for example, a rule in much library classification that at least 20% of the content of a book should be about the class to which the book is assigned.[1] In automatic classification it could be the number of times given words appears in a document.

    Request oriented classification (or -indexing) is classification in which the anticipated request from users is influencing how documents are being classified. The classifier ask himself: “Under which descriptors should this entity be found?” and “think of all the possible queries and decide for which ones the entity at hand is relevant” (Soergel, 1985, p. 230[2]).

    Request oriented classification may be classification that is targeted towards a particular audience or user group. For example, a library or a database for feminist studies may classify/index documents different compared to a historical library. It is probably better, however, to understand request oriented classification as policy based classification: The classification is done according to some ideals and reflects the purpose of the library or database doing the classification. In this way it is not necessarily a kind of classification or indexing based on user studies. Only if empirical data about use or users are applied should request oriented classification be regarded as a user-based approach.

    [edit] Classification versus indexing

    Sometimes a distinction is made between assigning documents to classes ("classification") versus assigning subjects to documents ("subject indexing") but as Frederick Wilfrid Lancaster has argued, this distinction not fruitful. "These terminological distinctions,” he writes, “are quite meaningless and only serve to cause confusion” (Lancaster, 2003, p. 21[3]). The view that this distinction is purely superficial is also supported by the fact that a classification system may be transformed into a thesaurus and vice versa (cf., Aitchison, 1986,[4] 2004;[5] Broughton, 2008;[6] Riesthuis & Bliedung, 1991[7]). Therefore is the act of labeling a document (say by assigning a term from a controlled vocabulary to a document) at the same time to assign that document to the class of documents indexed by that term (all documents indexed or classified as X belong to the same class of documents).

    [edit] Automatic document classification

    Automatic document classification tasks can be divided into three sorts: supervised document classification where some external mechanism (such as human feedback) provides information on the correct classification for documents, unsupervised document classification (also known as document clustering), where the classification must be done entirely without reference to external information, and semi-supervised document classification, where parts of the documents are labeled by the external mechanism.

    [edit] Techniques

    Automatic document classification techniques include:

    [edit] Applications

    Classification techniques have been applied to

    [edit] See also

    [edit] Further reading

    Publications:

    References:

    1. ^ Library of Congress (2008). The subject headings manual. Washington, DC.: Library of Congress, Policy and Standards Division. (Sheet H 180: "Assign headings only for topics that comprise at least 20% of the work.")
    2. ^ Soergel, Dagobert (1985). Organizing information: Principles of data base and retrieval systems. Orlando, FL: Academic Press.
    3. ^ Lancaster, F. W. (2003). Indexing and abstracting in theory and practice. Library Association, London.
    4. ^ Aitchison, J. (1986). “A classification as a source for thesaurus: The Bibliographic Classification of H. E. Bliss as a source of thesaurus terms and structure.” Journal of Documentation, Vol. 42 No. 3, pp. 160-181.
    5. ^ Aitchison, J. (2004). “Thesauri from BC2: Problems and possibilities revealed in an experimental thesaurus derived from the Bliss Music schedule.” Bliss Classification Bulletin, Vol. 46, pp. 20-26.
    6. ^ Broughton, V. (2008). “A faceted classification as the basis of a faceted terminology: Conversion of a classified structure to thesaurus format in the Bliss Bibliographic Classification (2nd Ed.).” Axiomathes, Vol. 18 No.2, pp. 193-210.
    7. ^ Riesthuis, G. J. A., & Bliedung, St. (1991). “Thesaurification of the UDC.” Tools for knowledge organization and the human interface, Vol. 2, pp. 109-117. Index Verlag, Frankfurt.
    8. ^ Stephan Busemann, Sven Schmeier and Roman G. Arens (2000). Message classification in the call center. In Sergei Nirenburg, Douglas Appelt, Fabio Ciravegna and Robert Dale, eds., Proc. 6th Applied Natural Language Processing Conf. (ANLP'00), pp. 158-165, ACL.
    9. ^ Santini, Marina; Rosso, Mark (2008), Testing a Genre-Enabled Application: A Preliminary Assessment, BCS IRSG Symposium: Future Directions in Information Access, London, UK, pp. 54–63, http://www.bcs.org/upload/pdf/ewic_fd08_paper7.pdf

    Data sets:

    posted on 2013-01-24 15:36 lexus 阅读(...) 评论(...) 编辑 收藏

    转载于:https://www.cnblogs.com/lexus/archive/2013/01/24/2875108.html

    展开全文
  • <div><p>Does bonnetal classification module support multilabel classification? Thanks a lot! Jack</p><p>该提问来源于开源项目:PRBonn/bonnetal</p></div>
  • DNN Sentence Classification

    2019-06-03 21:31:19
    In the sentence classification task, context formed from sentences adjacent to the sentence being classified can provide important information for classification. This context is, however, often ...
  • Image Classification Background

    万次阅读 2019-12-14 13:43:07
    I believe image classification is a great start point before diving into other computer vision fields, espacially for begginers who know nothing about deep learning. When I started to learn...

    Background

    I believe image classification is a great start point before diving into other computer vision fields, espacially
    for begginers who know nothing about deep learning. When I started to learn computer vision, I’ve made a lot of mistakes, I wish someone could have told me that which paper I should start with back then. There doesn’t seem to have a repository to have a list of image classification papers like deep_learning_object_detection until now. Therefore, I decided to make a repository
    of a list of deep learning image classification papers and codes to help others. My personal advice for people who
    know nothing about deep learning, try to start with vgg, then googlenet, resnet, feel free to continue reading other listed papers or switch to other fields after you are finished.

    Note: I also have a repository of pytorch implementation of some of the image classification networks, you can check out here.

    展开全文
  • Linear classification

    2019-09-29 23:15:13
    Classification, to find out which bounder side of a point or get the bounder to separate the dataset. This article is mainly about Linear Classification, using one hyper plane to separate the dat...
  • Textclassification

    2020-11-11 19:50:35
    Textclassification 中文短文本分类 包含TextCNN, TextDCNN, TextDPCNN, TextRCNN, TextRNN, TextRNN+Attention, Transformer, FastText等模型
  • <p>This would be a product of source.network_classification and destination.network_classification <p>eg <pre><code> source.network_classification: trusted destination.network_classification: ...
  • Currently the evaluation class only supports single label classification, even though SS3 inherently supports multilabel classification. These are the steps (I see) needed to support multilabel ...
  • Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
  • Text Classification

    2019-07-02 09:12:00
    Text Classification For purpose of word embedding extrinsic evaluation, especially downstream task. Some concepts are informed from 复旦大学NLP组 Statistical-Based Method Statistics perspective based ...
  • <p>This PR adds some features to the classification plug-in: - much better reworked widget - possibility to compute classification on clusters (from RANSAC or region growing) <p>Some undocumented code...
  • Mater tmva classification

    2020-11-27 16:33:51
    <div><ul><li>Code to support book multiple ml methods in the envelope class.</li><li>A new class Classification to perform two class classification in the new architecture of TMVA.</li></ul>该提问来源...
  • As a branch of classification, associative classification combines the basic ideas of association rule mining and general classification. Previous studies show that associative classification can ...
  • <p>I tried training a feedforward network with the output layer being classification data to all my training instances, but the output it generated doesn't seem right. I am hoping there is a ...
  • Statistical classification

    2018-02-16 19:25:07
    link address : https://en.wikipedia.org/wiki/Statistical_classification&gt;&gt;For the unsupervised learning approach, see Cluster ...In machine learning and statistics, classification is...
  • TMVA class Classification

    2020-11-30 15:05:32
    * Added class TMVA::Classification to perform two class Classification * Support to Train/Test multiple booked ml methods in parallel with MultiProc, calling the method Evaluate * Documentation with ...
  • classification 物体识别分类 项目介绍 该项目对物体进行识别分类。 项目配置 作者开发环境: Python 3.7 PyTorch >= 1.5.1 数据集 采用"Stanford Dogs Dataset"数据集官方地址:...
  • There are some examples of using PyTorch for image classification Usage Each file of this project is an example of image classification, you can learn from level1 to levelN. For more explaination of ...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 24,048
精华内容 9,619
关键字:

classification