Margin

Functional Margin

$\hat \gamma_i = y_i (w^T x_i + b)$

Geometric Margin

$\gamma_i = y_i (\frac{w^T}{||w||} x_i + \frac{b}{||w||})$

When $|| w ||=1$, the two margins are the same.

Linearly Separable

Goal: find a decision margin that maximizes the geometric margin.

\begin{aligned} \max & & \gamma \\ s.t. & & y_i (w^T x_i + b) \ge \gamma, & & i=1,...,m\\ & & \|w\| = 1 \end{aligned}

\begin{aligned} \max & & \frac{\hat \gamma}{\|w\|} \\ s.t. & & y_i (w^T x_i + b) \ge \hat \gamma, & & i=1,...,m \end{aligned}

\begin{aligned} \min & & \frac{\|w\|}{2} \\ s.t. & & y_i (w^T x_i + b) \ge 1, & & i=1,...,m \end{aligned}

Lagarange Duality

Primal Problem

\begin{aligned} \min_w & & f(w)\\ s.t. & & g_i(w) \le 0, & & i=1,...,k\\ & & h_i(w) = 0 & & i=1,...,l \end{aligned}

$\begin{eqnarray} \theta_P(w) = \begin{cases} f(w), & \text{if w s.t. primal constraints}\\ \inf, & \text{otherwise} \end{cases} \end{eqnarray}$

$\min_w \theta_P(w) = \min_w \max_{\alpha, \beta} \mathcal{L}(w, \alpha, \beta)$

Dual Problem

$\theta_D(\alpha, \beta) = min_w \mathcal{L}(w, \alpha, \beta)$

$max_{\alpha, \beta} \theta_D(\alpha, \beta) = \max_{\alpha, \beta} \min_w \mathcal{L}(w, \alpha, \beta)$

primal problem和dual problem的联系如下

$d^* = \max_{\alpha, \beta} \min_w \mathcal{L}(w, \alpha, \beta) \le \min_w \max_{\alpha, \beta} \mathcal{L}(w, \alpha, \beta) = p^*$

\begin{aligned} \frac{\partial}{\partial w_i} \mathcal{L} (w^*, \alpha^*, \beta^*) = 0, && i=1,...,n\\ \frac{\partial}{\partial \beta_i} \mathcal{L} (w^*, \alpha^*, \beta^*) = 0, && i=1,...,l\\ \alpha_i^* g_i(w^*) = 0, && i=1,...,k\\ g_i(w^*) = 0, && i=1,...,k\\ \alpha_i^* = 0, && i=1,...,k \end{aligned}

Conclusion: Apply Dual to SVM

Given

\begin{aligned} \min_w && \frac{1}{2} \|w\|^2 \\ s.t.&& 1 - y_i (w^T x_i + b) \le 0 && i=1,...,m \end{aligned}

Define Lagrangian to be $\mathcal{L}(w, b, \alpha) = \frac{1}{2} |w|^2 - \sum_{i=1}^{m} \alpha_i[y_i (w^T x_i + b) - 1]$.

\begin{aligned} \frac{\partial}{\partial w} \mathcal{L} (w,b,\alpha) & = w - \sum_{i=1}^m \alpha_i y_i x_i = 0\\ \frac{\partial}{\partial b} \mathcal{L} (w,b,\alpha) & = \sum_{i=1}^m \alpha_i y_i = 0 \end{aligned}

$\mathcal{L}(w,b,\alpha) = \sum_{i=1}^m \alpha_i - \frac{1}{2} \sum_{i=1}^m \sum_{j=1}^m y_i y_j \alpha_i \alpha_j \langle x_i, x_j \rangle$

Kernel

feature map定义为 $\varphi$。 给定某个$\varphi$的时候，我们可以定义Kernel

$K(x,z) = \varphi(x)^T \varphi(z)$

$K_{ij} = K(x_i, x_j)$

Theorem (Mercer). Let $K: \mathbb{R} \times \mathbb{R} \to \mathbb{R}$ be given. Then for $K$ to be a valid (Mercer) kernel, it is necessary and sufficient that for any ${ x_1, …, x_m }, m < \inf$, the corresponding kernel matrix is symmetric positive semi-definite.

Feature Space

A feature map is a map $\varphi : X \to F$, where F is a Hilbert space which we will call the feature space. Every feature map can naturally define a RKHS by means of the definition of a positive definite function.

Reproducing Kernel Hilbert Space

Metric Space

Def: A metric space is a pair $(M,d)$ where M is a set, and $d:M^2 \to \mathbb{R}$ which s.t.

• $d(x,y) > 0$
• $d(x,y) = d(y,x)$
• $d(a,b) \le d(a,c) + d(c,b)$

Some examples

• Bilinear Kernels: $K(x, y) = \langle x, y \rangle$,就是原始空间中的内积
• Polynomial Kernels: $K(x, y) = (\alpha \langle x, y \rangle + 1)^d$
• Radial basis function Kernels (RBF):
• Gaussian Kernel: $K(x,y) = \exp(-\frac{|| x-y ||}{2 \sigma^2})$
• Laplacian Kernel: $K(x,y) = \exp(-\frac{|| x-y ||}{\sigma^2})$