生成模型和判别模型
Intro
介绍
Recently I gave a presentation at work, where I explained how I solved some problem using Conditional Random Fields (CRF). And, since CRF was not so much known to my colleagues, I had a couple of slides devoted to the theory behind this algorithm. As I prepared the theory slides, I felt a duty to compare CRF with a conceptually similar algorithm, Hidden Markov Model (HMM). CRF is known to be a discriminative model and HMM is a generative model. I had to refresh my knowledge about this categorisation of supervised machine learning methods, especially generative models. Now I would like to share my understanding of the difference between generative and discriminative models in simple terms.
最近,我在工作中做了一个演讲,向我解释了如何使用条件随机场(CRF)解决一些问题。 而且,由于我的同事对CRF知之甚少,所以我有几张幻灯片专门讨论了该算法背后的理论。 在准备理论幻灯片时,我感到有责任将CRF与概念上类似的算法,即隐马尔可夫模型(HMM)进行比较 。 众所周知,CRF是一种判别模型,而HMM是一种生成模型。 我必须刷新有关监督式机器学习方法(尤其是生成模型)的分类的知识。 现在,我想简单地分享一下我对生成模型和区分模型之间差异的理解。
Generative models are a wide class of machine learning algorithms which make predictions by modelling joint distribution P(y, x).
生成模型是一类广泛的机器学习算法,它们通过对联合分布P(y,x)建模来进行预测。
Discriminative models are a class of supervised machine learning models which make predictions by estimating conditional probability P(y|x).
判别模型是一类监督的机器学习模型,通过估计条件概率P(y | x)进行预测。
In order to use a generative model, more unknowns should be solved: one has to estimate probability of each class and probability of observation given class. These probabilities are used to compute joint probability, and finally, joint probability can be used as a substitute for conditional probability to make predictions.
为了使用生成模型,应该解决更多的未知数:必须估计每个类别的概率和给定类别的观察概率。 这些概率用于计算联合概率,最后,联合概率可以用作条件概率的替代来进行预测。

The discriminative model takes a shorter way: it simply estimates conditional probability directly.
判别模型采用了更短的方法:它只是直接估算条件概率。
There are many pros and cons to each of the models. I just note that generative model can be used to generate new samples, but it requires more data. Discriminative model often superior than generative model, given the same amount of data, but it does not know about dependencies between features, because it is irrelevant for prediction. Therefore discriminative model can not generate new samples.
每个模型都有很多优点和缺点。 我只是注意到生成模型可用于生成新样本,但是它需要更多数据。 在给定相同数据量的情况下,判别模型通常优于生成模型,但它不了解要素之间的依赖性,因为它与预测无关。 因此,判别模型无法生成新样本。
Now let’s take a closer look at the concept of generative models.
现在,让我们仔细看看生成模型的概念。
Generative model
生成模型
As I showed earlier, to make predictions, conditional distribution P(y|x) is enough. But since P(y|x) = P(y, x) / P(x), where P(x) is constant for the given x and all possible y, it is valid to use joint distribution P(y, x) to make predictions.
如我先前所示,要进行预测,条件分布 P(y | x) 就足够了。 但是由于 P(y | x)= P(y,x)/ P(x) ,其中 P(x) 对于给定 x 和 所有可能的 y 都是常数 ,因此使用联合分布 P(y,x) 是有效的 做出预测。
By modelling joint distribution P(y, x) is meant that for each pair (yᵢ, xᵢ) a probability P(yi, xi) is known (modelled). At the beginning it was a bit difficult for me to understand how it is even possible — the range of possible values of X might be enormous, so it’s gonna be unrealistic to suggest probabilities for each xi, leave alone pair (yi, xi). How is it supposed to be done?
通过对联合分布P(y,x)进行建模,意味着对于每一对( yᵢ,xᵢ) ,已知概率P(yi,xi) (已建模)。 刚开始时,我很难理解它的可能性-X的可能值范围可能很大,因此建议每个xi的概率,而对( yi , xi)单独给出概率将是不现实的。 应该怎么做?
First. Bayes theorem! It breaks computation of joint probability P(y,x) into computation of two other types of probabilities: probability of class, P(y), and probability of observation given class, P(x|y).
第一。 贝叶斯定理! 它将联合概率P(y,x)的计算分解为另外两种类型的概率的计算:类概率P(y)和给定类观察概率P(x | y)。
P(y, x) = P(y) * P(x|y)
P(y,x)= P(y)* P(x | y)
What benefits does it give? This way it is at least easier to figure out probability P(y), because it can be estimated from the dataset by computing class frequencies. P(x|y) is trickier, because usually x is not just one feature, but a set of features: x = xi, …, xn, which might have dependencies between each other.
它有什么好处? 这样,至少可以容易地找出概率P(y) ,因为可以通过计算类别频率从数据集中进行估计。 P(x | y)比较棘手,因为通常x不仅是一个特征,而且是一组特征: x = xi,…,xn ,它们之间可能存在依赖关系。
P(x|y) = П P(xi|y, x1, xi-1, xi+1, xn)
P(x | y)=ПP(xi | y,x1,xi-1,xi + 1,xn)
Often the dependencies between the features are not known, especially when they appear in complex constellations (y, x1, xi-1, xi+1, xn).
通常,这些特征之间的依存关系是未知的,尤其是当它们出现在复杂的星座( y,x1,xi-1,xi + 1,xn)中时 。
So what should be done to estimate P(x|y)? For this, there is the following trick:
那么,应该怎么做才能估计P(x | y)呢? 为此,有以下技巧:
Second. Make wild assumptions! Or just some assumptions which make estimation of P(x|y) tractable. Naive Bayes classifier can serve as a perfect example of a generative model with such assumption, which makes computation of P(x|y) easier. Namely, it has independence assumption between the features xi, …, xn.
第二。 做出疯狂的假设! 或者只是使P(x | y)的估算变得容易的一些假设。 在这种假设下,朴素贝叶斯分类器可以用作生成模型的完美示例,这使得P(x | y)的计算更加容易。 即,它在特征xi,…,xn之间具有独立性假设。
P(x|y) = П P(xi|y)
P(x | y)=ПP(xi | y)
With this relaxation, estimation of P(x|y) is tractable, because every P(xi|y) can be estimated either by finding frequencies of discrete feature xi independently from other features or using Gaussian distribution, if feature xi is continuous.
通过这种放宽, P(x | y)的估计是易于处理的,因为如果特征xi是连续的,则可以通过独立于其他特征找到离散特征xi的频率或使用高斯分布来估计每个P(xi | y) 。
Conclusion
结论
So now you can see that in order to use generative models one should be prepared to estimate two types of probabilities P(y) and P(x|y). At the same time, discriminative models estimate conditional probability P(y|x) directly, which often is more efficient because one does not estimate dependencies between features, as these relationships don’t necessarily contribute to the prediction of the target variable.
因此,现在您可以看到,为了使用生成模型,应该准备一个估计两种类型的概率P(y)和P(x | y)。 同时,判别模型直接估算条件概率P(y | x) ,这通常更为有效,因为人们不估算特征之间的依赖性,因为这些关系不一定有助于目标变量的预测。
翻译自: https://medium.com/@tanyadembelova/introduction-to-generative-and-discriminative-models-9c9ef152b9af
生成模型和判别模型