# Likelihood

Assume $\epsilon \sim \mathcal{N}(\mu, \sigma^2)$, that is $p(\epsilon) = \frac{1}{\sqrt{2\pi} \sigma} exp \big( - \frac{\epsilon^2}{2\sigma^2} \big)$. This implies that $p(y|x;\theta) = \frac{1}{\sqrt{2\pi}\sigma} exp \big( - \frac{( y - \theta^T x)^2}{2\sigma^2} \big)$, which indicates the distribution of y given x and parameterized by $\theta$ (not condition on $\theta$ since it’s not a random variable). Or we can write $y | x; \theta \sim \mathcal{N}(\theta^T x, \sigma^2)$.

Suppose we have $m$ data points, so we want to instead maximize the log likelihood $\ell (\theta)$

# Regularization

## 1. Suppose $\theta \sim \mathcal{N}(0, \lambda)$

Suppose $\theta \in \mathbb{R}^p$, and $p(\theta) = \frac{1}{\sqrt{2\pi\lambda}} exp\big( -\frac{\theta^2}{2\lambda} \big)$。

Therefore, norm distribution introduces L2 norm, in the bayesian regression.

## 2. Suppose $\theta \sim Lap(0, \lambda)$

Similarly for Laplacian distribution, where $\theta = \frac{1}{2\lambda} exp \big(-\frac{|\theta|}{\lambda} \big)$。

Therefore, Laplacian distribution introduces L1 norm, in the bayesian regression.