Model (x - feature vector, y - one label)
$$p(y,x)=p(y)\prod_{k=1}^{K} p(x_k|y)$$
如果是在一系列观测序列 $x=(x_{1}, \dots, x_{n})$ 的基础上来预测一个类别序列 $y=(y_{1}, \dots, y_{n})$, 我们可以建立一个简单的序列模型:把单一的NB模型乘起来
$p(\vec{y} \mid \vec{x}) = \prod_{i=1}^{n} p(y_{i}) \cdot p(x_{i} \mid y_{i})$
Model:
$$p(y,x)=\prod_{t=1}^{T} p(y_t|y_{t-1})p(x_t|y_t)$$
HMM Frameword definition
Probabilities relating states and observations
Transition Matrix
First-order Hidden Markov Model assumptions
3 major problems of HMM
Disvantages of HMM
动态规划算法
最可能的在位置 $i$ ,以状态 $t$ 结束:$$\delta_{i}(t) = \underset{t_{0},\ldots,t_{i-1},t}{\max} \ \ P(t_{0},\ldots,t_{i-1},t,w_{1},\ldots,w_{i-1})$$
通过马尔可夫假设,
$$\delta_{i}(t) = \underset{t_{i-1}}{\max} \ \ P(t \mid t_{i-1}) \cdot P(w_{i-1} \mid t_{i-1}) \cdot \delta_{i}(t_{i-1})$$
最可能的前一个状态:$$\Psi_{i}(t) = \underset{t_{i-1}}{\arg\max} \ \ P(t \mid t_{i-1}) \cdot P(w_{i-1} \mid t_{i-1}) \cdot \delta_{i}(t_{i-1})$$
An unfilled trellis representation of an HMM
Word Emission and State Transitions probabilities matrices
模型 $\lambda=(A,B,\pi)$
$\alpha_t(i)=P(O_1,O_2,...O_t,x_t=q_i|\lambda)$
模型 $\lambda=(A,B,\pi)$
$\beta_t(i)=P(O_{t+1},O_{t+2},...O_T|i_t=q_i,\lambda)$
HMM中单个状态
给定模型 $\lambda$ 和观测序列 $O$ ,在时刻 $t$ 处于状态 $q_i$ 的概率记为:
$\gamma_t(i) = P(i_t = q_i | O,\lambda) = \frac{P(i_t = q_i ,O|\lambda)}{P(O|\lambda)}$
$P(i_t = q_i ,O|\lambda) = \alpha_t(i)\beta_t(i)$
$\gamma_t(i) = \frac{ \alpha_t(i)\beta_t(i)}{\sum\limits_{j=1}^N \alpha_t(j)\beta_t(j)}$
HMM中多个个状态
给定模型 $\lambda$ 和观测序列 $O$,在时刻 $t$ 处于状态 $q_i$,且时刻 $t+1$ 处于状态 $q_j$ 的概率记为:
$$\xi_t(i,j) = P(i_t = q_i, i_{t+1}=q_j | O,\lambda) = \frac{ P(i_t = q_i, i_{t+1}=q_j , O|\lambda)}{P(O|\lambda)}$$
而 $P(i_t = q_i, i_{t+1}=q_j , O|\lambda) = \alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)$
得到 $\xi_t(i,j) = \frac{\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)}{\sum\limits_{r=1}^N\sum\limits_{s=1}^N\alpha_t(r)a_{rs}b_s(o_{t+1})\beta_{t+1}(s)}$