精华内容
下载资源
问答
  • 隐马尔可夫

    2018-06-16 09:48:00
    隐马尔可夫
  • 因子隐马尔可夫(FHMM)由Ghahramani在1997年提出,是一种多链隐马尔可夫模型,适合动态过程时间序列的建模,并具有强大的时序模型的分类能力,特别适合非平稳、再现性差的序列的分析。 马尔可夫链 随机过程的研究...

    因子隐马尔可夫(FHMM)由Ghahramani在1997年提出,是一种多链隐马尔可夫模型,适合动态过程时间序列的建模,并具有强大的时序模型的分类能力,特别适合非平稳、再现性差的序列的分析。

    1. 马尔可夫链

    随机过程的研究对象是随时间演变的随机现象,它是从多维随机变量向一族(无限多个)随机变量的推广。设T是一个集合,\Omega =\left \{ \omega \right \}是随机试验的样本空间。X\left ( t,\omega \right )是定义在T和\Omega上的二元实函数,对于每个\omega \in \OmegaX\left ( t,\omega \right )是一个确定的时间函数,对每个t\in TX\left ( t,\omega \right )是一个随机变量。则

                                                                                                              \left \{ X\left ( t,\omega \right ) ,t\in T\right \}

    称为随机过程,简记为\left \{ X\left ( t \right ) ,t\in T\right \}或者\left \{ X\left ( t \right ) \right \}。其中每一函数称为随机过程的样本函数,T称为参数集。

    马尔可夫过程是满足无后效性的随机过程。假设一个随机过程中,t时刻的状态x_{n}的条件分布,仅仅与其前一个状态x_{n-1}有关,即P\left (x_{n}|x_{1},x_{2},...,x_{n-1} \right )=P\left ( x_{n}|x_{n-1} \right ),则将其称为马尔可夫过程,时间和状态的取值都是离散的马尔可夫过程也称为马尔可夫链。设\left \{ X\left ( t \right ) ,t\in T\right \}是一齐次马尔可夫链,则对任意,有

                                                                                                        p_{ij}(u+v)=\sum p_{ik}(u)p_{kj}(v),i,j=1,2...

    即为C-K方程,含义为“从时刻s所处的状态a_{i}出发,经时刻u+v到状态a_{j}X(s+u+v)=a_{j}”。

    2. 隐马尔可夫

    隐马尔可夫描述的是一个隐藏的马尔可夫链随机生成不可观测的状态随机序列,再由各个状态生成一个观测二产生观测随机序列的过程。隐藏的马尔可夫链随机生成的状态的序列,成为状态序列;每个状态生成一个观测,而由此产生的观测的随机序列,称为观测序列。

    隐马尔可夫模型由初始概率分布\pi、状态转移概率分布A以及观测概率分布B确定。隐马尔可夫模型的形式定义如下:

                                                                                                  \lambda =(A,B,\pi )

    设Q是所有可能的状态的集合,V是所有可能的观测的集合。

    Q=\left \{ q_{1},q_{2},...,q_{N} \right \},V=\left \{ v_{1},v_{2},...,v_{M} \right \}

    其中,N是可能的状态数,M是可能的观测数。

    I是长度为T的状态序列,O是对应的观测序列。

    I=\left \{ i_{1},i_{2},...,i_{T} \right \},O=\left \{ o_{1},o_{2},...,o_{T} \right \}

    A是状态转移概率矩阵:

    A=\left [ a_{ij} \right ]_{N\times N}

    其中,a_{ij}=P(i_{i+1}=q_{j}|i_{t}=q_{i}),i=1,2,..,N,j=1,2,...,N是在时刻t处于状态q_{i}的条件下在时刻t+1转移到状态q_{j}的概率。

    B是观测概率矩阵:

    B=\left [ b_{j} (k))\right ]_{N\times M}

    其中,b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j}),k=1,2,..,M,j=1,2,...,N是在时刻t处于状态的条件下生成观测的概率。

    \pi是初始状态概率向量:

    \pi=\left ( \pi _{i} \right )

    其中,\pi _{i}=P(i_{1}=q_{i}),i=1,2,...,N是时刻t=1处于状态q_{i}的概率。

    隐马尔可夫模型由初始状态概率向量,状态转移概率矩阵和观测概率矩阵决定,状态转移概率矩阵与初始状态概率向量确定了隐藏的马尔可夫链,生成不可观测的状态序列。观测概率矩阵确定了如何从状态生成观测,与状态序列综合确定了如何产生观测序列。

    隐马尔可夫模型的两个基本假设

    • 齐次马尔可夫假设,即假设隐藏的马尔可夫链在任意时刻t的状态只依赖于其前一时刻的状态,与其他时刻的状态及观测无关,也与时刻t无关。

    • 观测独立性假设,即假设任意时刻的观测只依赖于该时刻的马尔可夫链的状态,与其他观测及状态无关。

    隐马尔可夫模型有3个基本问题

    • 概率计算问题。给定模型\lambda =(A,B,\pi )和观测序列O=(o_{1},o_{2},...,o_{n}),计算在模型\lambda下观测序列O出现的概率P(o|\lambda )

    • 学习问题。已知观测序列O=(o_{1},o_{2},...,o_{n}),估计模型参数\lambda =(A,B,\pi ),使得在该模型下观测序列概率P(o|\lambda )最大。即用极大似然估计的方法估计参数。

    • 预测问题,也称为解码问题。已知模型\lambda =(A,B,\pi )和观测序列O=(o_{1},o_{2},...,o_{n}),求对给定观测序列条件概率P(o|\lambda )的最大的状态序列I=(i_{1},i_{2},...,i_{T})。即给定观测序列,求最有可能的对应的状态序列。

    2.1 概率计算问题

    主要有两种计算方式,前向(forward)与后向(backward)算法来计算观测序列概率P(o|\lambda )

    • 定义(前向概率)

    给定隐马尔可夫模型\lambda,定义到时刻t部分观测序列O=(o_{1},o_{2},...,o_{n})且状态为q_{i}的概率为前向概率,记为

    \alpha _{t}(i)=P(o_{1},o_{2},...,o_{t},i_{t}=q_{i}|\lambda )

    可以递推求得前向概率\alpha _{t}(i)及观测序列概率P(o|\lambda )

    算法

    输入:隐马尔可夫模型\lambda,观测序列O

    输出:观测序列概率P(O|\lambda )

    (a)初值

    \alpha _{1}(i)=\pi _{i}b_{i}(o_{1}),i=1,2,...,N

    (b)递推 对t=1,2,...,T-1,

    \alpha _{t+1}(i)=[\sum_{j}\alpha _{t}(j)a_{ji} ]b_{i}(o_{t+1}),i=1,2,...,N

    (c)终止

    P(O|\lambda )=\sum _{i=1}\alpha _{T}(i)

    例子

    条件:考虑盒子和球模型\lambda =(A,B,\pi ),状态集合Q =\left \{ 1,2,3 \right \},观测集合V =\left \{ red,white \right \},A=[\begin{matrix} a_{11} \ a_{12} \ a_{13} \\ a_{21} \ a_{22} \ a_{23} \\ a_{31} \ a_{32} \ a_{33} \end{matrix}]=[\begin{matrix} 0.5 \ 0.2 \ 0.3 \\ 0.3 \ 0.5 \ 0.2 \\ 0.2 \ 0.3 \ 0.5 \end{matrix}],B=[\begin{matrix} b_{11} \ b_{12} \\ b_{21} \ b_{22} \\ b_{31} \ b_{32} \end{matrix}]=[\begin{matrix} 0.5 \ 0.5 \\ 0.4 \ 0.6 \\ 0.7 \ 0.3 \end{matrix}],\pi =(0.2,0.4,0.4)^_{T}

    设定T=3,Q =\left \{ red,white,red \right \},试用前向算法计算P(O|\lambda )

    解:

    (a)计算初值

    \\\alpha _{1}(1)=\pi _{1}b_{1}(o_{1})=0.2\times 0.5=0.1(o_{1}=red)\\ \alpha _{1}(2)=\pi _{2}b_{2}(o_{1})=0.4\times 0.4=0.16(o_{1}=red)\\ \alpha _{1}(3)=\pi _{3}b_{3}(o_{1})=0.4\times 0.7=0.28(o_{1}=red)

    (b)递推计算

    t=1时:

    \alpha _{2}(1)=[\sum_{j}\alpha _{1}(i)a_{j1} ]b_{1}(o_{2})=[\alpha _{1}(1)a_{11}+\alpha _{1}(2)a_{21}+\alpha _{1}(3)a_{31}]b_{1}(o_{2})=[0.10\times 0.5+0.16\times 0.3+0.28\times 0.2]\times 0.5=0.77

    \alpha _{2}(2)=[\sum_{j}\alpha _{1}(i)a_{j2} ]b_{2}(o_{2})=[\alpha _{1}(1)a_{12}+\alpha _{1}(2)a_{22}+\alpha _{1}(3)a_{32}]b_{2}(o_{2})=[0.10\times 0.2+0.16\times 0.5+0.28\times 0.3]\times 0.6=0.1104

    \alpha _{2}(3)=[\sum_{j}\alpha _{1}(i)a_{j3} ]b_{3}(o_{2})=[\alpha _{1}(1)a_{13}+\alpha _{1}(2)a_{23}+\alpha _{1}(3)a_{33}]b_{3}(o_{2})=[0.10\times 0.3+0.16\times 0.2+0.28\times 0.5]\times 0.3=0.0606

    t=2时:

    \alpha _{3}(1)=[\sum_{j}\alpha _{2}(i)a_{j1} ]b_{1}(o_{3})=[\alpha _{2}(1)a_{11}+\alpha _{2}(2)a_{21}+\alpha _{2}(3)a_{31}]b_{1}(o_{3})=[0.077\times 0.5+0.1104\times 0.3+0.0606\times 0.2]\times 0.5=0.04187

    \alpha _{3}(2)=[\sum_{j}\alpha _{2}(i)a_{j2} ]b_{2}(o_{3})=[\alpha _{2}(1)a_{12}+\alpha _{2}(2)a_{22}+\alpha _{2}(3)a_{32}]b_{2}(o_{3})=[0.077\times 0.2+0.1104\times 0.5+0.0606\times 0.3]\times 0.4=0.03551

    \alpha _{3}(3)=[\sum_{j}\alpha _{2}(i)a_{i3} ]b_{3}(o_{3})=[\alpha _{2}(1)a_{13}+\alpha _{2}(2)a_{23}+\alpha _{2}(3)a_{33}]b_{3}(o_{3})=[0.077\times 0.3+0.1104\times 0.2+0.0606\times 0.5]\times 0.7=0.05284

    (c)终止

    P(O|\lambda )=\sum _{i=1}\alpha _{T}(i)=\alpha _{T}(1)+\alpha _{T}(2)+\alpha _{T}(3)=0.04187+0.03551+0.05284=0.13022

    • 定义(后向概率)

    给定隐马尔可夫模型\lambda,定义到时刻t状态为q_{i}的的条件下,从\small t+1到T部分观测序列为\small o_{t+1},o_{t+2},...,o_{T}的概率为后向概率,记为

    \beta _{t}(i)=P(o_{t+1},o_{t+2},...,o_{T}|i_{t}=q_{i},\lambda )

    可以递推求得后向概率\alpha _{t}(i)及观测序列概率P(o|\lambda )

    算法

    输入:隐马尔可夫模型\lambda,观测序列O

    输出:观测序列概率P(O|\lambda )

    (a)初值

    \beta _{T}(i)=1,i=1,2,...,N

    (b)递推 对t=T-1,T-2,...,1,

    \beta _{t}(i)=\sum_{j}a_{ij}b_{j}(o_{t+1}) \beta_{t+1}(j),i=1,2,...,N

    (c)终止

    P(O|\lambda )=\sum _{i=1}\pi (i)b_{i}(o_{1})\beta _{1}(i)

    例子

    条件:考虑盒子和球模型\lambda =(A,B,\pi ),状态集合Q =\left \{ 1,2,3 \right \},观测集合V =\left \{ red,white \right \},A=[\begin{matrix} a_{11} \ a_{12} \ a_{13} \\ a_{21} \ a_{22} \ a_{23} \\ a_{31} \ a_{32} \ a_{33} \end{matrix}]=[\begin{matrix} 0.5 \ 0.2 \ 0.3 \\ 0.3 \ 0.5 \ 0.2 \\ 0.2 \ 0.3 \ 0.5 \end{matrix}],B=[\begin{matrix} b_{11} \ b_{12} \\ b_{21} \ b_{22} \\ b_{31} \ b_{32} \end{matrix}]=[\begin{matrix} 0.5 \ 0.5 \\ 0.4 \ 0.6 \\ 0.7 \ 0.3 \end{matrix}],\pi =(0.2,0.4,0.4)^_{T}

    设定T=3,Q =\left \{ red,white,red \right \},试用后向算法计算P(O|\lambda )

    解:

    (a)计算初值

    \\\beta _{T}(1)=\beta _{3}(1)=1\\ \beta _{T}(2)=\beta _{3}(2)=1 \\ \beta _{T}(3)=\beta _{3}(3)=1

    (b)递推计算

    t=T-1,即t=2时:

    \beta _{2}(1)=\sum_{j}a_{1j}b_{j}(o_{3}) \beta_{3}(j)=[a_{11}b_{1}(o_{3}) \beta_{3}(1)+a_{12}b_{2}(o_{3}) \beta_{3}(2)+a_{13}b_{3}(o_{3}) \beta_{3}(3)]=[0.5\times 0.5\times 1+0.2\times 0.4\times 1+0.3\times 0.7\times 1]=0.54

    \beta _{2}(2)=\sum_{j}a_{2j}b_{j}(o_{3}) \beta_{3}(j)=[a_{21}b_{1}(o_{3}) \beta_{3}(1)+a_{22}b_{2}(o_{3}) \beta_{3}(2)+a_{23}b_{3}(o_{3}) \beta_{3}(3)]=[0.3\times 0.5\times 1+0.5\times 0.4\times 1+0.2\times 0.7\times 1]=0.49

    \beta _{2}(3)=\sum_{j}a_{3j}b_{j}(o_{3}) \beta_{3}(j)=[a_{31}b_{1}(o_{3}) \beta_{3}(1)+a_{32}b_{2}(o_{3}) \beta_{3}(2)+a_{33}b_{3}(o_{3}) \beta_{3}(3)]=[0.2\times 0.5\times 1+0.3\times 0.4\times 1+0.5\times 0.7\times 1]=0.57

    t=T-2,即t=1时:

    \beta _{1}(1)=\sum_{j}a_{1j}b_{j}(o_{2}) \beta_{2}(j)=[a_{11}b_{1}(o_{2}) \beta_{2}(1)+a_{12}b_{2}(o_{2}) \beta_{2}(2)+a_{13}b_{3}(o_{2}) \beta_{2}(3)]=[0.5\times 0.5\times 0.54+0.2\times 0.4\times 0.49+0.3\times 0.7\times 0.57]=0.2939

    \beta _{1}(2)=\sum_{j}a_{2j}b_{j}(o_{2}) \beta_{2}(j)=[a_{21}b_{1}(o_{2}) \beta_{2}(1)+a_{22}b_{2}(o_{2}) \beta_{2}(2)+a_{23}b_{3}(o_{2}) \beta_{2}(3)]=[0.3\times 0.5\times 0.54+0.5\times 0.4\times 0.49+0.2\times 0.7\times 0.57]=0.2588

    \beta _{1}(3)=\sum_{j}a_{3j}b_{j}(o_{2}) \beta_{2}(j)=[a_{31}b_{1}(o_{2}) \beta_{2}(1)+a_{32}b_{2}(o_{2}) \beta_{2}(2)+a_{33}b_{3}(o_{2}) \beta_{2}(3)]=[0.2\times 0.5\times 0.54+0.3\times 0.4\times 0.49+0.5\times 0.7\times 0.57]=0.3123

    (c)得到最终值

    P(O|\lambda )=\sum _{i=1}\pi (i)b_{i}(o_{1})\beta _{1}(i)=[\pi (1)b_{1}(o_{1})\beta _{1}(1)+\pi (2)b_{2}(o_{1})\beta _{1}(2)+\pi (3)b_{3}(o_{1})\beta _{1}(3)]=[0.2\times 0.5\times0.2939+ 0.4\times 0.4\times0.2588+0.4\times 0.7\times0.3123]=0.158242

    2.2 学习问题

    学习算法是已知观测序列O=(o_{1},o_{2},...,o_{n}),估计模型参数\lambda =(A,B,\pi ),使得在该模型下观测序列概率P(o|\lambda )最大。即用EM(Baum-Welch)的方法估计参数。

    2.3 预测问题

    隐马尔可夫模型预测有两种算法:近似算法和维特比算法(Viterbi algorithm)。

    维特比算法是用动态规划解隐马尔可夫模型预测问题,即用动态规划求概率最大路径。此时一条路径对应着一个状态序列。首先导入两个变量\delta\varphi,定义在时刻t状态为i的所有单个路径中概率最大值为

                                                                                                          \delta _{t}(i)=maxP(i_{t}=i,i_{t-1},...,i_{1},o_{t},...o_{1}|\lambda ),i=1,2,...,N

    由定义可得变量\delta的递推公式:

                                                                                             \delta _{t+1}(i)=maxP(i_{t+1}=i,i_{t},...,i_{1},o_{t+1},...o_{1}|\lambda )=max[\delta _{t}(j)a_{ji}]b_{i}(o_{t+1}),i=1,2,...,N;t=1,2,...,T-1

    定义在时刻t状态为i的所有单个路径(i_{1},i_{2},...,i_{t-1},i)中概率最大的路径的第t-1个结点为

                                                                                                             \varphi _{t}(i)=max[\delta _{t-1}(j)a_{ji}],i=1,2,...,N

    算法

    输入:模型\lambda =(A,B,\pi )和观测O=(o_{1},o_{2},...,o_{n})

    输出:最优路径(i_{1}^{*},i_{2}^{*},...,i_{T}^{*})

    (a)初始化

    \delta _{1}(i)=\pi _{i}b_{i}(o_{1}),i=1,2,...,N

    \varphi _{1}(i)=0,i=1,2,...,N

    (b)递推,对t=2,3,...,T

    \delta _{t}(i)=max[\delta_{t-1}(i)a_{ji} ]b_{i}(o_{t}),i=1,2,...,N

    \varphi _{t}(i)=arg max[\delta _{t-1}(j)a_{ji}],i=1,2,...,N

    (c)终止

    P^{*}=max\delta _{T}(i)

    i^{*}_{T}=arg max[\delta _{T}(i)]

    (d)最优路径回溯。对t=T-1,T-2,..1

    i^{*}_{t}=\varphi _{t+1}(i^{*}_{t+1})得到最优路径(i_{1}^{*},i_{2}^{*},...,i_{T}^{*})

    例子

    条件:考虑盒子和球模型\lambda =(A,B,\pi ),状态集合Q =\left \{ 1,2,3 \right \},观测集合V =\left \{ red,white \right \},A=[\begin{matrix} a_{11} \ a_{12} \ a_{13} \\ a_{21} \ a_{22} \ a_{23} \\ a_{31} \ a_{32} \ a_{33} \end{matrix}]=[\begin{matrix} 0.5 \ 0.2 \ 0.3 \\ 0.3 \ 0.5 \ 0.2 \\ 0.2 \ 0.3 \ 0.5 \end{matrix}],B=[\begin{matrix} b_{11} \ b_{12} \\ b_{21} \ b_{22} \\ b_{31} \ b_{32} \end{matrix}]=[\begin{matrix} 0.5 \ 0.5 \\ 0.4 \ 0.6 \\ 0.7 \ 0.3 \end{matrix}],\pi =(0.2,0.4,0.4)^_{T}

    设定T=3,Q =\left \{ red,white,red \right \},试用最优状态序列。

    (a)初始化

    \\\delta _{1}(1)=\pi _{1}b_{1}(o_{1})=0.10\\ \delta _{1}(2)=\pi _{2}b_{2}(o_{1})=0.16\\ \delta _{1}(3)=\pi _{3}b_{3}(o_{1})=0.28

    \\\varphi _{1}(1)=0\\ \varphi _{1}(2)=0\\ \varphi _{1}(3)=0

    (b)递推,

    t=2时

    \delta _{2}(1)=max[\delta_{1}(j)a_{j1} ]b_{1}(o_{2})=max[\delta_{1}(1)a_{11},\delta_{1}(2)a_{21},\delta_{1}(3)a_{31}]b_{1}(o_{2})=[0.1\times 0.5,0.16\times 0.3,0.28\times 0.2]\times 0.5=0.028

    \delta _{2}(2)=max[\delta_{1}(j)a_{j2} ]b_{2}(o_{2})=max[\delta_{1}(1)a_{12},\delta_{1}(2)a_{22},\delta_{1}(3)a_{32}]b_{1}(o_{2})=[0.1\times 0.2,0.16\times 0.5,0.28\times 0.3]\times 0.6=0.0504

    \delta _{2}(3)=max[\delta_{1}(j)a_{j3} ]b_{3}(o_{2})=max[\delta_{1}(1)a_{13},\delta_{1}(2)a_{23},\delta_{1}(3)a_{33}]b_{3}(o_{2})=[0.1\times 0.3,0.16\times 0.2,0.28\times 0.5]\times 0.3=0.042

    \varphi _{2}(1)=arg max[\delta _{1}(j)a_{j1}]=arg max[\delta _{1}(1)a_{11},\delta _{1}(2)a_{21},\delta _{1}(3)a_{31}]=[0.1\times 0.5,0.16\times 0.3,0.28\times 0.2]=3

    \varphi _{2}(2)=arg max[\delta _{1}(j)a_{j2}]=arg max[\delta _{1}(1)a_{12},\delta _{1}(2)a_{22},\delta _{1}(3)a_{32}]=[0.1\times 0.2,0.16\times 0.5,0.28\times 0.3]=3

    \varphi _{2}(3)=arg max[\delta _{1}(j)a_{j3}]=arg max[\delta _{1}(1)a_{13},\delta _{1}(2)a_{23},\delta _{1}(3)a_{33}]=[0.1\times 0.3,0.16\times 0.2,0.28\times 0.5]=3

    t=3时

    \delta _{3}(1)=max[\delta_{2}(j)a_{j1} ]b_{1}(o_{3})=max[\delta_{2}(1)a_{11},\delta_{2}(2)a_{21},\delta_{2}(3)a_{31}]b_{1}(o_{3})=[0.028\times 0.5,0.0504\times 0.3,0.042\times 0.2]\times 0.5=0.00756

    \delta _{2}(2)=max[\delta_{1}(j)a_{j2} ]b_{2}(o_{2})=max[\delta_{1}(1)a_{12},\delta_{1}(2)a_{22},\delta_{1}(3)a_{32}]b_{1}(o_{2})=[0.028\times 0.2,0.0504\times 0.5,0.042\times 0.3]\times 0.4=0.01008

    \delta _{3}(3)=max[\delta_{2}(j)a_{j3} ]b_{3}(o_{3})=max[\delta_{2}(1)a_{13},\delta_{2}(2)a_{23},\delta_{2}(3)a_{33}]b_{3}(o_{3})=[0.028\times 0.3,0.0504\times 0.2,0.042\times 0.5]\times 0.7=0.0147

    \varphi _{3}(1)=arg max[\delta _{2}(j)a_{j1}]=arg max[\delta _{2}(1)a_{11},\delta _{2}(2)a_{21},\delta _{2}(3)a_{31}]=[0.028\times 0.5,0.0504\times 0.3,0.042\times 0.2]=2

    \varphi _{3}(2)=arg max[\delta _{2}(j)a_{j2}]=arg max[\delta _{2}(1)a_{12},\delta _{2}(2)a_{22},\delta _{2}(3)a_{32}]=[0.028\times 0.2,0.0504\times 0.5,0.042\times 0.3]=2

    \varphi _{3}(3)=arg max[\delta _{2}(j)a_{j3}]=arg max[\delta _{2}(1)a_{13},\delta _{2}(2)a_{23},\delta _{2}(3)a_{33}]=[0.028\times 0.3,0.0504\times 0.2,0.042\times 0.5]=3

    (c)最优路径概率

    P^{*}=max\delta _{T}(i)=[\delta _{3}{1},\delta _{3}{2},\delta _{3}{3}]=[0.00756,0.01008,0.0147]=0.0147

    最优路径

    i^{*}_{3}=arg max[\delta _{3}(i)]=arg max[\delta _{3}(1),\delta _{3}(2),\delta _{3}(3)]=3

    (d)由最优路径的终点i^{*}_{3},逆向找到i^{*}_{2},i^{*}_{1}:

    t=2时,i^{*}_{2}=\varphi _{3}(i^{*}_{3})=\varphi _{3}(3)=3

    t=1时,i^{*}_{1}=\varphi _{2}(i^{*}_{2})=\varphi _{2}(3)=3

    即最优状态序列为I^{*}=(i_{1}^{*},i_{2}^{*},i_{3}^{*})=(3,3,3)

    3. hmmlearn应用

    3.1 hmmlearn的API

    3.1.1 hmmlearn.base

    (1)ConvergenceMonitor

    监控和报告收敛到sys.stderr,sys.stderr是是用来重定向标准错误信息的。可以自定义收敛方法。

    (2)_BaseHMM

    隐马尔可夫模型的基类,可以进行HMM参数的简单评估、抽样和最大后验估计。

    常见的函数有:

    3.1.2 hmmlearn.hmm

    总共有三个HMM模型:基于Gaussian的,基于Gaussian mixture,基于multinomial(discrete),具体链接

    • 状态序列符合高斯分布
    • 状态序列符合高斯混合分布
    • 状态序列是离散序列

    (1)GaussianHMM

    (2)GMMHMM

    (3)MultionmialHMM

    3.2 hmmlearn的Sample

    (1)sample

    已知初始概率矩阵,状态转移矩阵,每个成分的均值,每个成分的协方差,并且定义算法参数,然后生成HMM的样本数据,具体实例为,链接

    print(__doc__)
    
    import numpy as np
    import matplotlib.pyplot as plt
    
    from hmmlearn import hmm
    startprob = np.array([0.6, 0.3, 0.1, 0.0])
    # The transition matrix, note that there are no transitions possible
    # between component 1 and 3
    transmat = np.array([[0.7, 0.2, 0.0, 0.1],
                         [0.3, 0.5, 0.2, 0.0],
                         [0.0, 0.3, 0.5, 0.2],
                         [0.2, 0.0, 0.2, 0.6]])
    # The means of each component
    means = np.array([[0.0,  0.0],
                      [0.0, 11.0],
                      [9.0, 10.0],
                      [11.0, -1.0]])
    # The covariance of each component
    covars = .5 * np.tile(np.identity(2), (4, 1, 1))
    
    # Build an HMM instance and set parameters
    model = hmm.GaussianHMM(n_components=4, covariance_type="full")
    
    # Instead of fitting it from the data, we directly set the estimated
    # parameters, the means and covariance of the components
    model.startprob_ = startprob
    model.transmat_ = transmat
    model.means_ = means
    model.covars_ = covars
    # Generate samples
    X, Z = model.sample(500)
    
    # Plot the sampled data
    plt.plot(X[:, 0], X[:, 1], ".-", label="observations", ms=6,
             mfc="orange", alpha=0.7)
    
    # Indicate the component numbers
    for i, m in enumerate(means):
        plt.text(m[0], m[1], 'Component %i' % (i + 1),
                 size=17, horizontalalignment='center',
                 bbox=dict(alpha=.7, facecolor='w'))
    plt.legend(loc='best')
    plt.show()

    (2)train & inferring the hidden states

    import numpy as np
    from hmmlearn import hmm
    
    np.random.seed(42)
    #Sample data
    model = hmm.GaussianHMM(n_components=3,covariance_type='full')
    model.startprob_ = np.array([0.6,0.3,0.1])
    model.transmat_ = np.array([[0.7,0.2,0.1],
                              [0.3,0.5,0.2],
                              [0.3,0.3,0.4]])
    model.means_ = np.array([[0.0,0.0],[3.0,-3.0],[5.0,10.0]])
    model.covars_ = np.tile(np.identity(2),(3,1,1))
    X,Z = model.sample(100)
    
    #fit & predict
    remodel = hmm.GaussianHMM(n_components=3, covariance_type="full", n_iter=100)
    remodel.fit(X)
    Z2 = remodel.predict(X)
    
    #save model
    import pickle
    with open("filename.pkl", "wb") as file: pickle.dump(remodel, file)
    with open("filename.pkl", "rb") as file: pickle.load(file)
    

    参考文献:

    《统计机器学习》

    hmmlearn Document

    机器学习导论

    展开全文
  • 马尔可夫模型和隐马尔可夫by Divya Godayal 通过Divya Godayal 词性标注和隐马尔可夫模型简介 (An introduction to part-of-speech tagging and the Hidden Markov Model) by Sachin Malhotra and Divya Godayal ...

    马尔可夫模型和隐马尔可夫

    by Divya Godayal

    通过Divya Godayal

    词性标注和隐马尔可夫模型简介 (An introduction to part-of-speech tagging and the Hidden Markov Model)

    by Sachin Malhotra and Divya Godayal

    Sachin MalhotraDivya Godayal撰写

    Let’s go back into the times when we had no language to communicate. The only way we had was sign language. That’s how we usually communicate with our dog at home, right? When we tell him, “We love you, Jimmy,” he responds by wagging his tail. This doesn’t mean he knows what we are actually saying. Instead, his response is simply because he understands the language of emotions and gestures more than words.

    让我们回到没有语言进行交流的时代。 我们唯一的方式是手语。 那就是我们通常在家里与狗交流的方式,对吗? 当我们告诉他“我们爱你,吉米”时,他用摇尾巴回答。 这并不意味着他知道我们实际上在说什么。 相反,他的React仅仅是因为他比单词更能理解情感和手势语言。

    We as humans have developed an understanding of a lot of nuances of the natural language more than any animal on this planet. That is why when we say “I LOVE you, honey” vs when we say “Lets make LOVE, honey” we mean different things. Since we understand the basic difference between the two phrases, our responses are very different. It is these very intricacies in natural language understanding that we want to teach to a machine.

    作为人类,我们对自然语言的许多细微差别的理解比对地球上任何动物的理解都多。 这就是为什么当我们说“我爱你,亲爱的”而当我们说“让我爱你,亲爱的”时,我们意味着不同的事情。 由于我们了解这两个词组之间的基本区别,因此我们的回答也大不相同。 我们想教一台机器就是自然语言理解中的这些非常复杂的东西。

    What this could mean is when your future robot dog hears “I love you, Jimmy”, he would know LOVE is a Verb. He would also realize that it’s an emotion that we are expressing to which he would respond in a certain way. And maybe when you are telling your partner “Lets make LOVE”, the dog would just stay out of your business ?.

    这可能意味着当您未来的机器狗听到“我爱您,吉米”时,他会知道爱是一个动词。 他还将意识到,这是我们正在表达的一种情感,他将以某种方式做出回应。 也许当您告诉您的伴侣“让爱成为现实”时,那只狗就不会经营您的生意了?

    This is just an example of how teaching a robot to communicate in a language known to us can make things easier.

    这只是一个例子,说明如何教机器人以我们已知的语言进行交流可以使事情变得更容易。

    The primary use case being highlighted in this example is how important it is to understand the difference in the usage of the word LOVE, in different contexts.

    在此示例中突出显示的主要用例是,在不同的上下文中,理解“爱”一词用法的区别有多重要。

    词性标记 (Part-of-Speech Tagging)

    From a very small age, we have been made accustomed to identifying part of speech tags. For example, reading a sentence and being able to identify what words act as nouns, pronouns, verbs, adverbs, and so on. All these are referred to as the part of speech tags.

    从很小的时候起,我们就习惯了识别语音标签的一部分。 例如,阅读一个句子并能够识别哪些词充当名词,代词,动词,副词等。 所有这些都称为语音标签的一部分。

    Let’s look at the Wikipedia definition for them:

    让我们看看它们的Wikipedia定义:

    In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context — i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

    在语料库语言学中, 词性标记 ( POS标记PoS标记POST ),也称为语法标记单词类别歧义消除 ,是将文本(语料库)中的单词标记为与特定部分相对应的过程基于其定义和上下文(即,它与短语,句子或段落中相邻和相关单词的关系)的语言表达。 通常将这种简化形式教给学龄儿童,将单词识别为名词,动词,形容词,副词等。

    Identifying part of speech tags is much more complicated than simply mapping words to their part of speech tags. This is because POS tagging is not something that is generic. It is quite possible for a single word to have a different part of speech tag in different sentences based on different contexts. That is why it is impossible to have a generic mapping for POS tags.

    识别语音标签的一部分比简单地将单词映射到语音标签的部分要复杂得多。 这是因为POS标记不是通用的。 根据不同的上下文,单个单词很有可能在不同的句子中具有不同的语音标签部分。 这就是为什么不可能有POS标签的通用映射的原因。

    As you can see, it is not possible to manually find out different part-of-speech tags for a given corpus. New types of contexts and new words keep coming up in dictionaries in various languages, and manual POS tagging is not scalable in itself. That is why we rely on machine-based POS tagging.

    如您所见,无法为给定语料库手动找到不同的词性标签。 词典中不断出现各种类型的新上下文和新单词,并且手动POS标记本身无法扩展。 这就是为什么我们依赖基于机器的POS标记。

    Before proceeding further and looking at how part-of-speech tagging is done, we should look at why POS tagging is necessary and where it can be used.

    在继续进行并研究词性标记的完成方式之前,我们应该查看为什么需要POS标记以及可以在何处使用POS标记。

    为什么使用词性标记? (Why Part-of-Speech tagging?)

    Part-of-Speech tagging in itself may not be the solution to any particular NLP problem. It is however something that is done as a pre-requisite to simplify a lot of different problems. Let us consider a few applications of POS tagging in various NLP tasks.

    词性标记本身可能不能解决任何特定的NLP问题。 但是,这是简化许多不同问题的先决条件。 让我们考虑一下POS标记在各种NLP任务中的一些应用。

    文字转语音 (Text to Speech Conversion)

    Let us look at the following sentence:

    让我们看下面的句子:

    They refuse to permit us to obtain the refuse permit.

    The word refuse is being used twice in this sentence and has two different meanings here. refUSE (/rəˈfyo͞oz/)is a verb meaning “deny,” while REFuse(/ˈrefˌyo͞os/) is a noun meaning “trash” (that is, they are not homophones). Thus, we need to know which word is being used in order to pronounce the text correctly. (For this reason, text-to-speech systems usually perform POS-tagging.)

    refuse一词在这句话中被使用了两次,在这里有两个不同的含义。 refUSE(/ rəˈfyo͞oz /)是一个动词,表示“拒绝”,而REFuse(/ ˈrefˌyo͞os /)是一个名词,表示“垃圾”(也就是说,它们不是同音字)。 因此,我们需要知道使用了哪个单词才能正确发音。 (由于这个原因,文本语音转换系统通常执行POS标记。)

    Have a look at the part-of-speech tags generated for this very sentence by the NLTK package.

    看看NLTK软件包为此语句生成的词性标签。

    >>> text = word_tokenize("They refuse to permit us to obtain the refuse permit")>>> nltk.pos_tag(text)[('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP'),('to', 'TO'), ('obtain', 'VB'), ('the', 'DT'), ('refuse', 'NN'), ('permit', 'NN')]

    As we can see from the results provided by the NLTK package, POS tags for both refUSE and REFuse are different. Using these two different POS tags for our text to speech converter can come up with a different set of sounds.

    从NLTK软件包提供的结果可以看出, refuse和refuse的 POS标签是不同的。 将这两个不同的POS标签用于我们的文本到语音转换器可以提供不同的声音集。

    Similarly, let us look at yet another classical application of POS tagging: word sense disambiguation.

    同样,让我们​​看一下POS标签的另一种经典应用:词义消歧。

    词义消歧 (Word Sense Disambiguation)

    Let’s talk about this kid called Peter. Since his mother is a neurological scientist, she didn’t send him to school. His life was devoid of science and math.

    让我们谈谈这个叫彼得的孩子。 由于他的母亲是神经科科学家,她没有送他去学校。 他的生活缺乏科学和数学。

    One day she conducted an experiment, and made him sit for a math class. Even though he didn’t have any prior subject knowledge, Peter thought he aced his first test. His mother then took an example from the test and published it as below. (Kudos to her!)

    一天,她进行了一次实验,让他参加数学课。 即使他以前没有任何学科知识,彼得仍然认为他参加了第一次考试。 然后,他的母亲从测试中举了一个例子,并发布如下。 (对她表示敬意!)

    Words often occur in different senses as different parts of speech. For example:

    单词通常作为不同的词性出现在不同的意义上。 例如:

    • She saw a bear.

      她看到了一只熊。

    • Your efforts will bear fruit.

      您的努力将取得成果。

    The word bear in the above sentences has completely different senses, but more importantly one is a noun and other is a verb. Rudimentary word sense disambiguation is possible if you can tag words with their POS tags.

    上述句子中的单词Bear具有完全不同的含义,但更重要的是一个是名词,另一个是动词。 如果您可以使用POS标签标记单词,则可以进行基本的单词歧义消除。

    Word-sense disambiguation (WSD) is identifying which sense of a word (that is, which meaning) is used in a sentence, when the word has multiple meanings.

    词义歧义消除(WSD)标识当单词具有多种含义时,该单词在句子中使用哪种含义(即,哪种含义)。

    Try to think of the multiple meanings for this sentence:

    试着想一想这句话的多重含义:

    Time flies like an arrow

    时间像箭一样飞逝

    Here are the various interpretations of the given sentence. The meaning and hence the part-of-speech might vary for each word.

    这是给定句子的各种解释。 每个单词的含义以及词性可能会有所不同。

    As we can clearly see, there are multiple interpretations possible for the given sentence. Different interpretations yield different kinds of part of speech tags for the words.This information, if available to us, can help us find out the exact version / interpretation of the sentence and then we can proceed from there.

    我们可以清楚地看到,给定句子可能有多种解释。 不同的解释会产生不同的词性语音标签。这些信息(如果可用)可以帮助我们找出句子的确切版本/解释,然后从那里开始。

    The above example shows us that a single sentence can have three different POS tag sequences assigned to it that are equally likely. That means that it is very important to know what specific meaning is being conveyed by the given sentence whenever it’s appearing. This is word sense disambiguation, as we are trying to find out THE sequence.

    上面的示例向我们展示了一个句子可以分配给它的三个不同的POS标签序列的可能性相同。 这意味着,当给定的句子出现时,知道它所传达的具体含义非常重要。 这是单词意义上的歧义,因为我们试图找出THE序列。

    These are just two of the numerous applications where we would require POS tagging. There are other applications as well which require POS tagging, like Question Answering, Speech Recognition, Machine Translation, and so on.

    这些只是我们需要POS标记的众多应用中的两个。 还有其他一些需要POS标记的应用程序,例如问题回答,语音识别,机器翻译等。

    Now that we have a basic knowledge of different applications of POS tagging, let us look at how we can go about actually assigning POS tags to all the words in our corpus.

    现在,我们已经对POS标记的不同应用有了基本的了解,下面让我们看一下如何将POS标记实际分配给语料库中的所有单词。

    POS标记器的类型 (Types of POS taggers)

    POS-tagging algorithms fall into two distinctive groups:

    POS标记算法分为两个不同的组:

    • Rule-Based POS Taggers

      基于规则的POS标记

    • Stochastic POS Taggers

      随机POS匕首

    E. Brill’s tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms. Let us first look at a very brief overview of what rule-based tagging is all about.

    E. Brill的标记器是最早使用最广泛的英语POS标记器之一,它使用基于规则的算法。 让我们首先看一下有关基于规则的标记的简要概述。

    基于规则的标记 (Rule-Based Tagging)

    Automatic part of speech tagging is an area of natural language processing where statistical techniques have been more successful than rule-based methods.

    语音标记的自动部分是自然语言处理的一个领域,在该领域中,统计技术比基于规则的方法更为成功。

    Typical rule-based approaches use contextual information to assign tags to unknown or ambiguous words. Disambiguation is done by analyzing the linguistic features of the word, its preceding word, its following word, and other aspects.

    典型的基于规则的方法使用上下文信息将标签分配给未知或歧义词。 通过分析单词,其前一个单词,其后一个单词以及其他方面的语言特征来实现歧义消除。

    For example, if the preceding word is an article, then the word in question must be a noun. This information is coded in the form of rules.

    例如,如果前一个单词是冠词,则该单词必须是名词。 此信息以规则的形式编码。

    Example of a rule:

    规则示例:

    If an ambiguous/unknown word X is preceded by a determiner and followed by a noun, tag it as an adjective.
    如果模棱两可/未知的单词X前面有确定词,后面是名词,则将其标记为形容词。

    Defining a set of rules manually is an extremely cumbersome process and is not scalable at all. So we need some automatic way of doing this.

    手动定义一组规则是一个非常繁琐的过程,并且根本无法扩展。 因此,我们需要一些自动的方式来执行此操作。

    The Brill’s tagger is a rule-based tagger that goes through the training data and finds out the set of tagging rules that best define the data and minimize POS tagging errors. The most important point to note here about Brill’s tagger is that the rules are not hand-crafted, but are instead found out using the corpus provided. The only feature engineering required is a set of rule templates that the model can use to come up with new features.

    Brill的标记器是一个基于规则的标记器,它遍历训练数据并找出最能定义数据并最大程度降低POS标记错误的标记规则集。 关于Brill的标记器,这里需要注意的最重要一点是规则不是手工制作的,而是使用提供的语料库找到的。 唯一需要的功能工程是一组规则模板 ,模型可以使用这些规则模板来提供新功能。

    Let’s move ahead now and look at Stochastic POS tagging.

    现在让我们继续前进,看看随机POS标签。

    随机词性标注 (Stochastic Part-of-Speech Tagging)

    The term ‘stochastic tagger’ can refer to any number of different approaches to the problem of POS tagging. Any model which somehow incorporates frequency or probability may be properly labelled stochastic.

    术语“随机标记器”可以指代解决POS标记问题的许多不同方法。 任何以某种方式结合了频率或概率的模型都可以适当地随机标记。

    The simplest stochastic taggers disambiguate words based solely on the probability that a word occurs with a particular tag. In other words, the tag encountered most frequently in the training set with the word is the one assigned to an ambiguous instance of that word. The problem with this approach is that while it may yield a valid tag for a given word, it can also yield inadmissible sequences of tags.

    最简单的随机标记器仅根据单词与特定标签一起出现的可能性来消除单词歧义。 换句话说,在训练集中最经常遇到的带有该单词的标签是分配给该单词歧义实例的标签。 这种方法的问题在于,尽管它可能为给定的单词生成一个有效的标签,但它也会生成不可接受的标签序列。

    An alternative to the word frequency approach is to calculate the probability of a given sequence of tags occurring. This is sometimes referred to as the n-gram approach, referring to the fact that the best tag for a given word is determined by the probability that it occurs with the n previous tags. This approach makes much more sense than the one defined before, because it considers the tags for individual words based on context.

    单词频率方法的替代方法是计算给定标签序列出现的概率。 这有时被称为n-gram方法,是指给定单词的最佳标记取决于它与n个先前标记一起出现的概率。 这种方法比以前定义的方法更有意义,因为它根据上下文考虑单个单词的标签。

    The next level of complexity that can be introduced into a stochastic tagger combines the previous two approaches, using both tag sequence probabilities and word frequency measurements. This is known as the Hidden Markov Model (HMM).

    可以引入到随机标记器中的下一个复杂度级别结合了前两种方法,同时使用了标记序列概率和字频测量。 这被称为隐马尔可夫模型(HMM)

    Before proceeding with what is a Hidden Markov Model, let us first look at what is a Markov Model. That will better help understand the meaning of the term Hidden in HMMs.

    在进行隐藏操作之前 马尔可夫模型,让我们首先看看什么是马尔可夫模型。 这将有助于更好地理解“ 隐藏 ”一词的含义 在HMM中。

    马尔可夫模型 (Markov Model)

    Say that there are only three kinds of weather conditions, namely

    假设只有三种天气状况,即

    • Rainy

      多雨的
    • Sunny

      阳光明媚
    • Cloudy

      多云的

    Now, since our young friend we introduced above, Peter, is a small kid, he loves to play outside. He loves it when the weather is sunny, because all his friends come out to play in the sunny conditions.

    现在,由于我们上面介绍的年轻朋友彼得是个小孩,他喜欢在户外玩。 天气晴朗时,他喜欢它,因为他的所有朋友都出来在阳光明媚的条件下比赛。

    He hates the rainy weather for obvious reasons.

    由于明显的原因,他讨厌阴雨天气。

    Every day, his mother observe the weather in the morning (that is when he usually goes out to play) and like always, Peter comes up to her right after getting up and asks her to tell him what the weather is going to be like. Since she is a responsible parent, she want to answer that question as accurately as possible. But the only thing she has is a set of observations taken over multiple days as to how weather has been.

    每天,母亲都会观察早晨的天气(也就是他通常出去玩的时间),彼得总是像往常一样站起来,要求她告诉他天气会怎样。 由于她是一个负责任的父母,她想尽可能准确地回答这个问题。 但是她唯一的一件事就是对天气进行了多天的一系列观察。

    How does she make a prediction of the weather for today based on what the weather has been for the past N days?

    她如何根据过去N天的天气情况来预测今天的天气?

    Say you have a sequence. Something like this:

    假设您有一个序列。 像这样:

    Sunny, Rainy, Cloudy, Cloudy, Sunny, Sunny, Sunny, Rainy

    Sunny, Rainy, Cloudy, Cloudy, Sunny, Sunny, Sunny, Rainy

    So, the weather for any give day can be in any of the three states.

    因此,任何给定日的天气都可以在三个州中的任何一个州。

    Let’s say we decide to use a Markov Chain Model to solve this problem. Now using the data that we have, we can construct the following state diagram with the labelled probabilities.

    假设我们决定使用马尔可夫链模型来解决此问题。 现在,使用已有的数据,我们可以构建带有标记概率的以下状态图。

    In order to compute the probability of today’s weather given N previous observations, we will use the Markovian Property.

    为了计算N个先前的观测值得出的今天天气的概率,我们将使用马尔可夫性质。

    Markov Chain is essentially the simplest known Markov model, that is it obeys the Markov property.

    马尔可夫链本质上是最简单的已知马尔可夫模型,即服从马尔可夫性质。

    The Markov property suggests that the distribution for a random variable in the future depends solely only on its distribution in the current state, and none of the previous states have any impact on the future states.

    马尔可夫性质表明,未来随机变量的分布仅取决于其在当前状态下的分布,而先前的状态都不会对未来状态产生任何影响。

    For a much more detailed explanation of the working of Markov chains, refer to this link.

    有关马尔可夫链工作原理的更多详细说明,请参阅链接。

    Also, have a look at the following example just to see how probability of the current state can be computed using the formula above, taking into account the Markovian Property.

    另外,看看下面的示例,看看在考虑马尔可夫特性的情况下如何使用上述公式计算当前状态的概率。

    Apply the Markov property in the following example.

    在以下示例中应用Markov属性。

    We can clearly see that as per the Markov property, the probability of tomorrow's weather being Sunny depends solely on today's weather and not on yesterday's .

    我们可以清楚地看到,根据Markov属性, tomorrow's天气晴朗的可能性完全取决于today's天气,而不取决于yesterday's天气。

    Let us now proceed and see what is hidden in the Hidden Markov Models.

    现在让我们继续看一下隐马尔可夫模型中隐藏的内容。

    隐马尔可夫模型 (Hidden Markov Model)

    It’s the small kid Peter again, and this time he’s gonna pester his new caretaker — which is you. (Ooopsy!!)

    再次是小彼得,这次他要缠着他的新看管人-就是你。 (糟糕!)

    As a caretaker, one of the most important tasks for you is to tuck Peter into bed and make sure he is sound asleep. Once you’ve tucked him in, you want to make sure he’s actually asleep and not up to some mischief.

    作为看守,对您来说最重要的任务之一就是让Peter卧床并确保他睡着了。 将他塞进去之后,您要确保他确实在睡觉,并且不会有任何恶作剧。

    You cannot, however, enter the room again, as that would surely wake Peter up. So all you have to decide are the noises that might come from the room. Either the room is quiet or there is noise coming from the room. These are your states.

    但是,您不能再次进入房间,因为这肯定会使Peter醒来。 因此,您只需要确定房间可能发出的噪音即可。 房间很安静,或者房间里有噪音 。 这些是你的状态。

    Peter’s mother, before leaving you to this nightmare, said:

    彼得的母亲在让您陷入这场噩梦之前说:

    May the sound be with you :)
    声音会和你在一起:)

    His mother has given you the following state diagram. The diagram has some states, observations, and probabilities.

    他的母亲给了您以下状态图。 该图具有一些状态,观察值和概率。

    Note that there is no direct correlation between sound from the room and Peter being asleep.

    请注意,房间发出的声音与Peter入睡之间没有直接关系。

    There are two kinds of probabilities that we can see from the state diagram.

    从状态图中可以看到两种概率。

    • One is the emission probabilities, which represent the probabilities of making certain observations given a particular state. For example, we have P(noise | awake) = 0.5 . This is an emission probability.

      一是发射 概率,代表在特定状态下进行某些观察的概率。 例如,我们有P(noise | awake) = 0.5 。 这是发射概率。

    • The other ones is transition probabilities, which represent the probability of transitioning to another state given a particular state. For example, we have P(asleep | awake) = 0.4 . This is a transition probability.

      另一个是过渡 概率,表示在特定状态下转换为另一状态的概率。 例如,我们有P(asleep | awake) = 0.4 。 这是转移概率。

    The Markovian property applies in this model as well. So do not complicate things too much. Markov, your savior said:

    马尔可夫属性也适用于此模型。 因此,不要使事情复杂化。 马尔可夫,您的救世主说:

    Don’t go too much into the history…
    不要过多地关注历史……

    The Markov property, as would be applicable to the example we have considered here, would be that the probability of Peter being in a state depends ONLY on the previous state.

    马尔可夫性质(适用于我们在此考虑的示例)将是Peter处于状态的概率仅取决于先前的状态。

    But there is a clear flaw in the Markov property. If Peter has been awake for an hour, then the probability of him falling asleep is higher than if has been awake for just 5 minutes. So, history matters. Therefore, the Markov state machine-based model is not completely correct. It’s merely a simplification.

    但是,马尔可夫性质存在明显缺陷。 如果彼得醒了一个小时,那么他入睡的几率比醒来仅5分钟要高。 因此,历史很重要。 因此,基于马尔可夫状态机的模型并不完全正确。 这只是一个简化。

    The Markov property, although wrong, makes this problem very tractable.

    马尔可夫性质,尽管是错误的,但使这个问题非常容易解决。

    We usually observe longer stretches of the child being awake and being asleep. If Peter is awake now, the probability of him staying awake is higher than of him going to sleep. Hence, the 0.6 and 0.4 in the above diagram.P(awake | awake) = 0.6 and P(asleep | awake) = 0.4

    我们通常会观察到较长时间的孩子处于清醒和入睡状态。 如果彼得现在醒着,那么他保持清醒的可能性比他入睡的可能性高。 因此,上图中的0.6和0.4。 P(awake | awake) = 0.6 and P(asleep | awake) = 0.4

    Before actually trying to solve the problem at hand using HMMs, let’s relate this model to the task of Part of Speech Tagging.

    在实际尝试使用HMM解决当前问题之前,让我们将此模型与语音标记的任务联系起来。

    用于语音标记的HMM (HMMs for Part of Speech Tagging)

    We know that to model any problem using a Hidden Markov Model we need a set of observations and a set of possible states. The states in an HMM are hidden.

    我们知道,要使用隐马尔可夫模型对任何问题进行建模,我们需要一组观察值和一组可能的状态。 HMM中的状态被隐藏。

    In the part of speech tagging problem, the observations are the words themselves in the given sequence.

    在语音标记问题中, 观察结果是给定序列中的单词本身。

    As for the states, which are hidden, these would be the POS tags for the words.

    至于隐藏的状态 ,这些将是单词的POS标签。

    The transition probabilities would be somewhat like P(VP | NP) that is, what is the probability of the current word having a tag of Verb Phrase given that the previous tag was a Noun Phrase.

    过渡概率有点像P(VP | NP) ,也就是说,鉴于先前的单词是名词短语,当前单词具有动词短语标签的概率是多少。

    Emission probabilities would be P(john | NP) or P(will | VP) that is, what is the probability that the word is, say, John given that the tag is a Noun Phrase.

    发射概率将是P(john | NP) or P(will | VP) ,也就是说,假设标签是名词短语,那么单词是John的概率是多少。

    Note that this is just an informal modeling of the problem to provide a very basic understanding of how the Part of Speech tagging problem can be modeled using an HMM.

    请注意,这只是问题的非正式建模,以提供对如何使用HMM建模词性标注问题的非常基本的了解。

    我们该如何解决呢? (How do we solve this?)

    Coming back to our problem of taking care of Peter.

    回到我们照顾彼得的问题。

    Irritated are we ? ?.

    我们烦了吗? ?

    Our problem here was that we have an initial state: Peter was awake when you tucked him into bed. After that, you recorded a sequence of observations, namely noise or quiet, at different time-steps. Using these set of observations and the initial state, you want to find out whether Peter would be awake or asleep after say N time steps.

    我们这里的问题是,我们有一个初始状态:当您将彼得塞到床上时,彼得醒了。 之后,您在不同的时间步长上记录了一系列观察结果,即噪音安静 。 使用这些观察结果和初始状态,您想确定在经过N个时间步长之后,Peter会醒还是睡着。

    We draw all possible transitions starting from the initial state. There’s an exponential number of branches that come out as we keep moving forward. So the model grows exponentially after a few time steps. Even without considering any observations. Have a look at the model expanding exponentially below.

    我们从初始状态开始绘制所有可能的过渡。 随着我们不断前进,出现了指数级的分支。 因此,模型经过几个时间步呈指数增长 。 即使不考虑任何观察。 看看下面的指数级扩展模型。

    If we had a set of states, we could calculate the probability of the sequence. But we don’t have the states. All we have are a sequence of observations. This is why this model is referred to as the Hidden Markov Model — because the actual states over time are hidden.

    如果我们有一组状态,我们可以计算出序列的概率。 但是我们没有州。 我们所拥有的只是一系列观察结果。 这就是为什么将此模型称为“ 马尔可夫模型”的原因-因为随着时间的推移,实际状态是隐藏的。

    So, caretaker, if you’ve come this far it means that you have at least a fairly good understanding of how the problem is to be structured. All that is left now is to use some algorithm / technique to actually solve the problem. For now, Congratulations on Leveling up!

    因此,看守者,如果您走了这么远,就意味着您至少对如何解决问题有一个很好的了解。 现在剩下的就是使用某种算法/技术来实际解决问题。 目前, 恭喜您升级!

    In the next article of this two-part series, we will see how we can use a well defined algorithm known as the Viterbi Algorithm to decode the given sequence of observations given the model. See you there!

    在这个由两部分组成的系列的下一篇文章中,我们将了解如何使用定义良好的算法(称为维特比算法)对给定模型的给定观测序列进行解码。 到时候那里见!

    翻译自: https://www.freecodecamp.org/news/an-introduction-to-part-of-speech-tagging-and-the-hidden-markov-model-953d45338f24/

    马尔可夫模型和隐马尔可夫

    展开全文
  • 隐马尔可夫模型差不多是学习中遇到的最难的模型了,本节通过对《统计学习方法》进行学习并结合网上笔记,用Python代码实现了隐马模型观测序列的生成、前向后向算法、Baum-Welch无监督训练、维特比算法。比较清晰的...

    286fcbbae47a0025fa9e195cfda76304.png

    隐马尔可夫模型差不多是学习中遇到的最难的模型了,本节通过对《统计学习方法》进行学习并结合网上笔记,用Python代码实现了隐马模型观测序列的生成、前向后向算法、Baum-Welch无监督训练、维特比算法。比较清晰的了解了隐马尔可夫模型,其实在实际运用中我们只需要调用库就一行代码解决问题。

    在调用分词函数时就采用的隐马尔可夫模型HMM原理来进行切词。

    134186710_1_20180527113242207

    具体原理解释介绍如下:

    1 隐马尔可夫模型定义

    隐马尔可夫模型是关于时序的概率模型,描述由一个隐藏的马尔可夫链随机生成不可观测的状态随机序列,再由各个状态生成一个观测而产生观测随机序列的过程。隐藏的马尔可夫链随机生成的状态的序列,称为状态序列(state sequence);每个状态生成一个观测,而由此产生的观测的随机序列,称为观测序列(observation sequence)。序列的每一个位置又可以看作是一个时刻。

    什么叫隐马尔科夫链呢?简单说来,就是把时间线看做一条链,每个节点只取决于前N个节点。就好比你打开朋友圈通过你朋友的最近几条状态就可以判断他接下来要干嘛。

    接下来引入一些符号来表述定义:

    设Q是所有可能的状态的集合,V是所有可能的观测的集合。

    134186710_2_20180527113242285

    其中,N是可能的状态数,M是可能的观测数。

    状态q不可见,观测v可见。应用到词性标注系统,词就是v,词性就是q。

    I是长度为T的状态序列,O是对应的观测序列。

    134186710_3_20180527113242332

    这可以理解为相当于给定了一个词(O)+词性(I)的训练集或者说是中文分词系统里的词典。

    定义A为状态转移概率矩阵:

    134186710_4_20180527113242363

    其中,

    134186710_5_20180527113242394

    是在时刻t处于状态qi的条件下在时刻t+1转移到状态qj的概率。

    这实际在表述一个一阶的HMM,所作的假设是每个状态只跟前一个状态有关。

    B是观测概率矩阵:

    134186710_6_20180527113242441

    其中,

    134186710_7_20180527113242472

    是在时刻t处于状态qj的条件下生成观测vk的概率。

    π是初始状态概率向量:

    134186710_8_20180527113242503

    其中,

    134186710_9_20180527113242535

    是时刻t=1处于状态qj的概率。

    隐马尔可夫模型由初始状态概率向量π、状态转移概率矩阵A和观测概率矩阵B决定。π和A决定状态序列,B决定观测序列。因此,隐马尔可夫模型可以用三元符号表示,即

    134186710_10_20180527113242582

    括号里的三个变量称为隐马尔可夫模型的三要素。加上一个具体的状态集合Q和观测序列V,则构成了HMM的五元组。

    状态转移概率矩阵A与初始状态概率向量π确定了隐藏的马尔可夫链,生成不可观测的状态序列。观测概率矩阵B确定了如何从状态生成观测,与状态序列综合确定了如何产生观测序列。

    从定义可知,隐马尔可夫模型作了两个基本假设:

    (1)齐次马尔可夫性假设,即假设隐藏的马尔可夫链在任意时刻t的状态只依赖于其前一时刻的状态,与其他时刻的状态及观测无关。

    134186710_11_20180527113242597

    (2)观测独立性假设,即假设任意时刻的观测只依赖于该时刻的马尔可夫链的状态,与其他观测及状态无关。

    134186710_12_20180527113242628

    2 观测序列的生成过程

    根据隐马尔可夫模型定义,将一个长度为T的观测序列O的生成过程描述如下:

    观测序列的生成算法:

    输入:隐马尔可夫模型

    134186710_10_20180527113242582,观测序列长度

    输出:观测序列

    134186710_13_20180527113242660

    (1)按照初始状态分布

    134186710_14_20180527113242691产生状态

    134186710_15_20180527113242722

    (2)令t=1

    (3)按照状态

    134186710_15_20180527113242722的观测概率分布

    134186710_16_20180527113242769生成

    134186710_17_20180527113242800

    (4)按照状态

    134186710_15_20180527113242722的状态转移概率分布

    134186710_18_20180527113242847产生状态

    134186710_19_20180527113242878

    令t=t+1;如果t<>

    2.1 观测序列生成Python实现

    算法首先初始化两个长度为T的向量,接着按照初始状态分布pi生成第一个状态,取出状态对应的观测的概率分布,生成一个观测。

    接下来都是按前一个状态取出状态转移概率分布生成状态,再取出状态对应的观测的概率分布生成一个观测。重复该步骤就得到长度为T的观测和状态向量。

    代码如下:import numpy as np

    class HMM():

    def __init__(self, A, B, pi):

    self.A = A

    self.B = B

    self.pi = pi

    def simulate(self, T):

    # draw_from接受一个概率分布,然后生成该分布下的一个样本。

    def draw_from(probs):

    return np.where(np.random.multinomial(1, probs) == 1)[0][0]

    observations = np.zeros(T, dtype=int)

    states = np.zeros(T, dtype=int)

    states[0] = draw_from(self.pi)

    observations[0] = draw_from(self.B[states[0], :])

    for t in range(1, T):

    states[t] = draw_from(self.A[states[t-1], :])

    observations[t] = draw_from(self.B[states[t], :])

    return states, observations

    3 隐马尔可夫模型的3个基本问题

    (1)   概率计算问题。给定模型

    134186710_20_20180527113242925和观测序列

    134186710_21_20180527113242972,计算在模型

    134186710_22_201805271132433下观测序列O出现的概率

    134186710_23_2018052711324335

    (2)   学习问题。己知观测序列

    134186710_21_20180527113242972,估计模型

    134186710_20_20180527113242925参数,使得在该模型下观测序列概率

    134186710_24_2018052711324366最大。即用极大似然估计的方法估计参数。

    (3)   预测问题,也称为解码(decoding)问题。已知模型

    134186710_20_20180527113242925和观测序列

    134186710_21_20180527113242972,求对给定观测序列条件概率

    134186710_25_2018052711324397最大的状态序列

    134186710_26_20180527113243128。即给定观测序列,求最有可能的对应的状态序列。

    下面各节将逐一介绍这些基本问题的解法。

    4 概率计算算法

    这节介绍计算观测序列概率的前向(forward)与后向(backward)算法,前向后向算法就是求第一个状态的前向概率或最后一个状态的后向概率,然后向后或向前递推即可。

    4.1 前向算法

    前向概率给定隐马尔可夫模型

    134186710_27_20180527113243144,定义到时刻t为止的观测序列为

    134186710_28_20180527113243175且状态为

    134186710_29_20180527113243207的概率为前向概率,记作

    134186710_30_20180527113243238

    可以递推地求得前向概率

    134186710_31_20180527113243285及观测序列概率

    134186710_32_20180527113243316

    观测序列概率的前向算法:

    输入:隐马尔可夫模型

    134186710_27_20180527113243144,观测序列

    134186710_33_20180527113243347;

    输出:观测序列概率

    134186710_32_20180527113243316

    (1)初值

    134186710_34_20180527113243378

    前向概率的定义中一共限定了两个条件,一是到当前为止的观测序列,另一个是当前的状态。所以初值的计算也有两项(观测和状态),一项是初始状态概率,另一项是发生到当前观测的概率。

    (2)递推对

    134186710_35_20180527113243441

    134186710_36_20180527113243488

    每次递推同样由两部分构成,大括号中是当前状态为i且观测序列的前t个符合要求的概率,括号外的是状态i发生观测t+1的概率。

    (3)终止

    134186710_37_20180527113243535

    由于到了时间T,一共有N种状态发生了最后那个观测,所以最终的结果要将这些概率加起来。

    4.2 前向算法Python实现

    代码如下:1def forward(self, obs_seq):

    2    '''前向算法'''

    3    N = self.A.shape[0]

    4    T = len(obs_seq)

    5    F = np.zeros((N, T))

    6    F[:, 0] = self.pi * self.B[:, obs_seq[0]]

    7    for t in range(1, T):

    8        for n in range(N):

    9            F[n, t] = np.dot(F[:, t-1], (self.A[:, n])) * self.B[n, obs_seq[t]]

    10    return F

    4.3 后向算法

    后向算法相当于前向算法的反向递推,具体可查看书178页。

    4.4 后向算法Python实现

    代码如下:1def backward(self, obs_seq):

    2    '''后向算法'''

    3    N = self.A.shape[0]

    4    T = len(obs_seq)

    5    M = np.zeros((N, T))

    6    M[:, T-1] = 1

    7    # 或者M[:, -1:] = 1,列上表示最后一行

    8    for t in reversed(range(T-1)):

    9        for n in range(N):

    10            M[n, t] = np.dot(self.A[n, :], M[:, t+1]) * self.B[n, obs_seq[t+1]]

    11    return M

    5 学习问题

    Baum-Welch算法也就是EM算法,己知观测序列

    134186710_21_20180527113242972,估计模型

    134186710_20_20180527113242925参数,使得在该模型下观测序列概率

    134186710_24_2018052711324366最大。即用极大似然估计的方法估计参数。而如何解决这个估计问题这里采用Baum-Welch算法。

    关于算法的详细解释可参考:https://blog.csdn.net/u014688145/article/details/53046765?locationNum=7&fps=1

    5.1 Baum-Welch算法Python实现

    代码如下:1def EM(self, observation, criterion=0.05):

    2    '''EM算法进行参数学习'''

    3    n_state = self.A.shape[0]

    4    n_sample = len(observation)

    5    done = 1

    6    while done:

    7        Alpha = self.forward(observation)

    8        Beta = self.backward(observation)

    9        xi = np.zeros((n_state, n_state, n_sample-1))

    10        gamma = np.zeros((n_state, n_sample))

    11        for t in range(n_sample-1):

    12            denom = np.sum(np.dot(Alpha[:, t].T, self.A) * self.B[:, observation[t+1]].T * Beta[:, t+1].T)

    13            sum_gamma1 = np.sum(Alpha[:, t] * Beta[:, t])

    14            for i in range(n_state):

    15                numer = Alpha[i, t] * self.A[i, :] * self.B[:, observation[t+1]].T * Beta[:, t+1].T

    16                xi[i, :, t] = numer/denom

    17            gamma[i, t] = Alpha[i, t] * Beta[i, t] / sum_gamma1

    18        last_col = Alpha[:, n_sample-1].T * Beta[:, n_sample-1]

    19        gamma[:, n_sample-1] = last_col / np.sum(last_col)

    20        # 更新参数

    21        n_pi = gamma[:, 0]

    22        n_A = np.sum(xi, 2) / np.sum(gamma[:, :-1], 1)

    23        n_B = np.copy(self.B)

    24        num_level = self.B.shape[1]

    25        sum_gamma = 0

    26        a = 0

    27        for lev in range(num_level):

    28            for h in range(n_state):

    29                for t in range(n_sample):

    30                    sum_gamma = sum_gamma + gamma[h, t]

    31                    if observation[t] == lev:

    32                        a = a + gamma[h, t]

    33                n_B[h, lev] = a / sum_gamma

    34                a = 0

    35        # 检查阈值

    36        if np.max(np.abs(self.pi-n_pi)) < criterion="">and np.max(np.abs(self.B-n_B)) < criterion="">

    37                and np.max(np.abs(self.A-n_A)) <>criterion:

    38            done = 0

    39        self.A, self.B, self.pi = n_A, n_B, n_pi

    40    return self.A, self.B, self.pi

    6 预测算法

    6.1 维特比算法

    输入:模型

    134186710_38_20180527113243582和观测

    134186710_39_20180527113243613;

    输出:最优路径

    134186710_40_20180527113243644

    ⑴初始化

    134186710_41_20180527113243675

    (2)递推。对

    134186710_42_20180527113243722

    134186710_43_20180527113243753

    (3)终止

    134186710_44_20180527113243816

    (4)最优路径回溯。对

    134186710_45_20180527113243878

    134186710_46_20180527113243910

    求得最优路径

    134186710_40_20180527113243644

    6.2 维特比算法Python实现

    维特比算法是一种动态规划算法用于寻找最有可能产生观测事件序列的-维特比路径-隐含状态序列。维特比算法其实就是多步骤每步多选择模型的最优选择问题,其在每一步的所有选择都保存了前续所有步骤到当前步骤当前选择的最小总代价(或者最大价值)以及当前代价的情况下前继步骤的选择。依次计算完所有步骤后,通过回溯的方法找到最优选择路径。

    代码如下:1def viterbi(self, obs_seq):

    2    '''viterbi算法预测状态序列'''

    3    N = self.A.shape[0]

    4    T = len(obs_seq)

    5    P = np.zeros((N, T))

    6    prev_point = np.zeros((N, T))

    7    prev_point[:, 0] = 0

    8    P[:, 0] = self.pi * self.B[:, obs_seq[0]]

    9    for t in range(1, T):

    10        for n in range(N):

    11            P[n, t] = np.max(P[:, t - 1] * self.A[:, n]) * self.B[n, obs_seq[t]]

    12            prev_point[n, t] = np.argmax(P[:, t - 1] * self.A[:, n] * self.B[n, obs_seq[t]])

    13    return P, prev_point

    6.3 最优路径生成Python实现

    最优路径其实也是维特比算法的一部分,当已经确定了T时刻的最优状态i,接下来通过回溯的方式找到最优路径。

    代码如下:1def build_path(self, obs_seq):

    2    '''return the optimal path'''

    3    P, prev_point = self.viterbi(obs_seq)

    4    T = len(obs_seq)

    5    opt_path = np.zeros(T)

    6    last_state = np.argmax(P[:, T-1])

    7    opt_path[T-1] = last_state

    8    for t in reversed(range(T-1)):

    9        opt_path[t] = prev_point[int(opt_path[t+1]), t+1]

    10    last_path = reversed(opt_path)

    11    return last_path

    6.4 实践

    用《统计学习方法》中的案例数据进行测试,代码如下:1if __name__ == '__main__':

    2# 用《统计学习方法》中的案例进行测试

    3A = np.array([[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]])

    4B = np.array([[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]])

    5pi = np.array([0.2, 0.4, 0.4])

    6test1 = HMM(A, B, pi)

    7obs_seq = [0, 1, 0]

    8print(test1.forward(obs_seq))

    9print(test1.backward(obs_seq))

    10print(test1.viterbi(obs_seq))

    11print(test1.build_path(obs_seq))

    12print(test1.EM(obs_seq))

    参考:李航. 统计学习方法[M]. 北京:清华大学出版社,2012

    码农场.隐马尔可夫模型:http://www.hankcs.com/ml/hidden-markov-model.html

    展开全文
  • 隐马尔可夫模型(Hidden Markov Model,HMM)是统计模型,它用来描述一个含有隐含未知参数的马尔可夫过程。其难点是从可观察的参数中确定该过程的隐含参数。隐马尔可夫模型注意包含以下参数:可见状态链,即观测序列,...

    隐马尔可夫模型(Hidden Markov Model,HMM)是统计模型,它用来描述一个含有隐含未知参数的马尔可夫过程。其难点是从可观察的参数中确定该过程的隐含参数。

    隐马尔可夫模型注意包含以下参数:可见状态链,即观测序列,用

    表示。

    隐含状态链,隐含状态之间存在转换概率(transition probability),一般用状态转移矩阵A表示。

    隐含状态和可见状态之间有一个概率叫做输出概率(emission probability),一般用输出观测矩阵B表示。

    初始状态,描述初始各隐含状态的概率,用

    表示

    隐马尔可夫模型各参数基本含义和关系

    用一个例子来表述隐马尔可夫模型各参数的关系:假设三个盒子

    按照如下方式进行取球:(1)开始,从三个盒子取球概率分别为0.2,0.4,0.4

    (2)从第二次取球开始,设定,若上次从盒子1取球,则下次从三个盒子取球概率为0.5,0.2,0.3;若从盒子2取球,下次从三个盒子取球概率为0.3,0.5,0.2;若从盒子3,下次概率依次为0.2,0.3, 0.5,

    重复三次后,得到球观测序列为{红,白,红}。

    根据HMM定义,得到,观察集合:{红,白}

    状态集合:{盒子1,盒子2,盒子3}

    初始状态 [0.2,0.4,0.4]

    状态转移概率矩阵

    观测概率矩阵

    需要注意观测序列和观测集合是两码事。

    HMM求解概率—前向算法

    首先说明HMM求观测序列概率的前向后向算法。以前向算法为例。

    (1) 计算时刻1的观测为

    ,隐含序号为i的各个隐藏状态前向概率:

    例如,第一个球为红球,隐藏状态为一号盒子的概率:

    隐藏状态为二号盒子的概率:

    隐藏状态为三号盒子的概率:

    2) 递推时刻2,3,...T时刻观测为,隐含序号为i的各个隐藏状态前向概率的前向概率:

    例如时刻2是白色球,隐藏状态是盒子1的概率为:

    (3)计算t时刻观测序列的概率

    HMM解码—维特比算法,

    维特比算法针对的形式,如下图,作用是求第一列到第三列的最短路径。

    在HMM模型的解码问题中,一般给定模型状态转移矩阵A,输出观测概率矩阵B,初始状态概率。以及观测序列,求给定观测序列条件下,最可能出现的对应的隐藏状态序列I,即P(I|O)要最大化。例如本例,求三次摸球分别是哪些盒子。相当于已知结果(观测序列,逆推隐藏序列)

    维特比算法为动态规划算法,基本思路是先正向计算最大概率,再根据最后一个结点回溯得到前面的结点。定义两个状态转移公式:

    (1)时刻t隐藏状态为i所有可能的状态转移路径中的概率最大值。记为

    (2)定义在时刻t隐藏状态为i概率最大的转移路径中第t−1个节点(前一个结点)的隐藏状态为

    ,用于回溯计算结点:

    输入:HMM模型

    观测序列O

    输出:最有可能的隐藏状态序列I

    1)初始化局部状态:

    例如本例,时刻1,第一个观测值为红球

    =0.2×0.5=0.1

    =0.4×0.4=0.16

    =0.4×0.7=0.28

    2) 进行动态规划递推时刻t=2,3,...T时刻的局部状态:

    递推三个隐藏状态在时刻2时的动态规划局部状态

    递推三个隐藏状态在时刻3时的动态规划局部状态

    (3)根据最大概率局部状态进行回溯,其中

    T=3时其中

    最大,回溯自

    到T=2时得到

    ,回溯到

    从而得到最终的最可能的隐藏状态序列为:(3,3,3)

    代码:

    class HMM(object):

    def __init__(self):

    self.A=array([(0.5,0.2,0.3),(0.3,0.5,0.2),(0.2,0.3,0.5)])

    self.B=array([(0.5,0.5),(0.4,0.6),(0.7,0.3)])

    self.pi=array([(0.2),(0.4),(0.4)])

    self.o=[0,1,0]

    self.t=len(self.o)#观测序列长度

    self.m=len(self.A)#状态集合大小

    self.n=len(self.B[0])#观测集合大小

    def forward(self):

    #t时刻部分观测序列为o1,o2,ot,状态为qi的概率用矩阵x表示,

    #则矩阵大小行数为观测序列数,列数为状态个数

    self.x=array(zeros((self.t,self.m))) #先计算出时刻1时,观测为o[0]的概率

    for i in range(self.m): # 对隐藏集合遍历

    self.x[0][i]=self.pi[i]*self.B[i][self.o[0]] # 初始某状态的概率*观测状态概率(输出状态为o1)

    for j in range(1,self.t):

    for i in range(self.m):

    #前一时刻所有状态的概率乘以转移概率得到i状态概率

    #i状态的概率*i状态到j观测的概率

    temp=0

    for k in range(self.m):

    temp=temp+self.x[j-1][k]*self.A[k][i]

    self.x[j][i]=temp*self.B[i][self.o[j]]

    result=0

    for i in range(self.m):

    result=result+self.x[self.t-1][i]

    print (u"前向概率矩阵及当前观测序列概率如下:")

    print (self.x)

    print (result)

    def viterbi(self):

    # 维特比方法的对象是两个矩阵

    #利用模型和观测序列找出最优的状态序列

    #每个路径都有自己的概率,最大的概率用矩阵z记录,前一个状态用d矩阵记录

    # self.t=len(self.o)#观测序列长度

    # self.m=len(self.A)#状态集合大小

    self.z=array(zeros((self.t,self.m))) # 3*3的矩阵

    self.d=array(zeros((self.t,self.m)))

    for i in range(self.m): # 初始状态

    self.z[0][i]=self.pi[i]*self.B[i][self.o[0]]

    self.d[0][i]=0

    for j in range(1,self.t):

    for i in range(self.m):

    maxnum=self.z[j-1][0]*self.A[0][i]

    node=1

    for k in range(1,self.m): # 求最大转移概率

    temp=self.z[j-1][k]*self.A[k][i]

    if(maxnum

    maxnum=temp

    node=k

    self.z[j][i]=maxnum*self.B[i][self.o[j]] # 转移序列状态

    self.d[j][i]=node # 上个状态的结点

    #找到T时刻概率的结点

    print ("概率顺序",self.d)

    print ("路径概率",self.z)

    max_probability=self.z[self.t-1][0]

    node_path = []

    temp=0

    for i in range(1,self.m):

    if(max_probability < self.z[self.t-1][i]):

    max_probability=self.z[self.t-1][i]

    temp=i

    i=self.t-1

    temp = int(temp)

    node_path.append(temp+1)

    #self.d[t][p],t时刻状态为p的时候,t-1时刻的状态

    #回溯,找到路径

    while(i>=1):

    #last_node.append(self.d[i][temp])

    node_path.append(int (self.d[i][temp]+1))

    temp = int (self.d[i][temp])

    i=i-1

    node_path.reverse()

    print (u'路径节点分别为')

    print ("node_path", node_path)

    print (u'该路径概率为'+str(max_probability))

    展开全文
  • 马尔可夫模型和隐马尔可夫 使用Markov模型(以数学家Andrey Markov的名字命名)用于随机变化系统的预测。 马尔可夫的见解是,在这种情况下,仅可以从事件的最新发生来做出良好的预测,而忽略当前事件之前的任何发生...
  • 隐马尔可夫模型

    2018-11-06 14:46:36
    清华大学的机器学习网络课程讲义——隐马尔可夫模型。
  • 隐马尔可夫模型.doc

    2021-03-17 15:35:02
    隐马尔可夫模型原理说明
  • 隐马尔可夫模型-源码

    2021-02-17 09:09:02
    隐马尔可夫模型 Python Languange用于活动识别
  • 隐马尔可夫模型基础

    2017-04-18 10:19:09
    隐马尔可夫模型基础算法
  • 《统计学习方法》 李航著 第十章 隐马尔可夫模型我是小白一个;本文代码转载地址文末有注释;有问题请多指教先看书,看完书,代码就看懂了。程序只是将算法翻译成机器认识的罢了import numpy as npclass ...
  • NLP-隐马尔可夫模型

    万次阅读 2020-10-08 12:58:51
    NLP-隐马尔可夫模型
  • 文章目录前言隐马尔可夫(HMM)马尔可夫链隐马尔可夫HMM中的语音识别(孤立词)HMM 语音识别过程GMM总结参考文献 前言 隐马尔科夫链结合语言识别,在细节上,涉及到的知识挺多,没有一定的时间投入难以很好的去把握。...
  • 隐马尔可夫模型详细推导隐马尔可夫模型简介1.隐马尔可夫模型简介隐马尔可夫模型举例2.隐马尔可夫模型举例隐马尔可夫三个基本问题及推导3.隐马尔可夫三个基本问题推导隐马尔可夫模型应用领域4.隐马尔可夫模型应用参考...
  • 马尔可夫链和隐马尔可夫模型简介
  • 隐马尔可夫模型ppt

    2018-05-02 11:13:46
    此ppt由专业人员编写,内容条例清晰,重点突出,结合了简单易懂的实例,深入浅出的介绍了隐马尔可夫模型。
  • 基于加权观测的隐马尔可夫模型
  •  通过对马尔可夫模型进行深入的分析的基础上对隐马尔科夫模型做了详细的讨论,对马尔科夫模型在语音识别、疾病分析等方面的应用做了介绍...最后讨论了马尔科夫模型其隐马尔可夫模型的缺陷,并提出相关的改进建议。

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 3,385
精华内容 1,354
关键字:

隐马尔可夫