Recursive Generalised Linear Model
$\eta = \beta^\top x + \beta_0$
$y = \eta+\epsilon \qquad \epsilon \sim \mathcal{N}(0,\sigma^2)$
Extend linear model to problems where the distribution on the targets is not Gaussian but some other distribution (typically a distribution in the exponential family)
$\eta = \beta^\top x, \qquad \beta=[\hat \beta, \beta_0], x = [\hat{x}, 1]$
$\mathbb{E}[y] = \mu = g^{-1}(\eta)$
$g(·)$ is the link function
What RGLM means?
$h_l(x) = f_l(\eta_l)$
$\mathbb{E}[y] = \mu_L = h_L \circ \ldots \circ h_1 \circ h_o(x)$
$\mathcal{L} = - \log p(y | \mu_L)$
Dataset
NN
Training
Single neuron operation
Single Layer
$z_i^{[l]}=w_i^T\cdot a^{[l-1]}+b_i, \ \ \ a_i^{[l]}=g^{[l]}(z_i^{[l]})$
$z_1^{[2]}=w_1^T\cdot a^{[1]}+b_1, \ \ \ a_1^{[2]}=g^{[2]}(z_1^{[2]})$
$z_2^{[2]}=w_2^T\cdot a^{[1]}+b_2, \ \ \ a_2^{[2]}=g^{[2]}(z_2^{[2]})$
$z_3^{[2]}=w_3^T\cdot a^{[1]}+b_3, \ \ \ a_3^{[2]}=g^{[2]}(z_3^{[2]})$
$z_4^{[2]}=w_4^T\cdot a^{[1]}+b_4, \ \ \ a_4^{[2]}=g^{[2]}(z_4^{[2]})$
$z_5^{[2]}=w_5^T\cdot a^{[1]}+b_5, \ \ \ a_5^{[2]}=g^{[2]}(z_5^{[2]})$
$z_6^{[2]}=w_6^T\cdot a^{[1]}+b_6, \ \ \ a_6^{[2]}=g^{[2]}(z_6^{[2]})$
Matrix Operation
Vectorizing across multiple examples
Activation function
Loss Function
$J(W,b)=\frac{1}{m}\sum_{i=1}^mL(\hat{y}^{(i)},y^{(i)})$
$L(\hat{y},y)=-(ylog(\hat{y})+(1-y)log(1-\hat{y}))$
Preventing Overfitting
L1 or L2 Regularization
How do neural networks learn?
Critical Points
Backpropagation
Parameters adjust
$W^{[l]}=W^{[l]}-\alpha dW^{[l]}$
$b^{[l]}=b^{[l]}-\alpha db^{[l]}$
In Matrix Form
$dW^{[l]}=\frac{\partial L}{\partial W^{[l]}}=\frac{1}{m}dZ^{[l]}A^{[l-1]T}$
$db^{[l]}=\frac{\partial L}{\partial b^{[l]}}=\frac{1}{m}\sum_{i=1}^m dZ^{[l](i)}$
$dA^{[l-1]}=\frac{\partial L}{\partial A^{[l-1]}}=W^{[l]T}dZ^{[l]}$
$dZ^{[l]}=dA^{[l]}*g^{'}(Z^{[l]})$