Machine Learning
PAC-Bayes
Kullback-Leibler divergence / KL divergence is a natural measure of the distance of the two distances.
Maximum Likelihood
Likelihood: given the independence assumption, probability of observing sample under distribution $p \in \mathcal{P}$ is $ Pr[x_1, …, x_m] = \prod_{i=1}^{m} p(x_i) $
Principle: select distribution maximizing sample probability \(p^* = \underset{p\in \mathcal{P}}{arg \, max} \prod_{i=1}^{m} p(x_i)\\ \Longleftrightarrow p^* = \underset{p\in \mathcal{P}}{arg \, max} \sum_{i=1}^{m} log \, p(x_i)\)
Inequalities
Markov Inequality
Let $Z$ nonnegative random variable
\[\mathbb{E}[Z] = \int_{x=0}^{\infty} P[Z \ge x] dx\]Since $\mathbb{P}[Z \ge x]$ is monotonically nonincreasing, we have
\[\forall a \ge 0, \mathbb{E}[Z] \ge \int_{x=0}^a \mathbb{P}[Z \ge x] dx \ge \int_{x=0}^a \mathbb{P}[Z \ge a] dx = a \mathbb{P}[Z \ge a]\]Rearranging inequality, yields Markov’s Inequality
\[\forall a \ge 0, \mathbb{P}[Z \ge a] \le \frac{\mathbb{E}[Z]}{a}\]