assignment1

Question 1: Geometric Understanding of SVM

Question:
Consider a binary classification dataset ${(x_{i}, y_{i})}_{i = 1}^{n}$ with $y_{i} \in {+ 1, - 1}$ , and a linear classifier:

f (x) = sign (w^{T} x + b)

Define the geometric margin and explain its meaning.
Show that after proper scaling of the parameters, maximizing the geometric margin is equivalent to minimizing $∥ w ∥^{2}$ .
Explain why only the support vectors affect the final classifier.

Answer:
几何间隔表示样本点到分类超平面 $w^{T} x + b = 0$ 的带符号距离。对样本 $(x_{i}, y_{i})$ ，几何间隔为

γ_{i} = \frac{y _{i} ( w ^{T} x _{i} + b )}{∥ w ∥} .

它衡量分类器对该样本分类的“安全程度”：间隔越大，样本离决策边界越远，分类越稳定。

由于 $(w, b)$ 同时乘以正常数不会改变分类边界，可以把参数缩放到满足

y_{i} (w^{T} x_{i} + b) \geq 1.

此时最小几何间隔为

γ = \frac{1}{∥ w ∥} .

因此最大化几何间隔等价于最小化 $∥ w ∥$ ，通常写成最小化 $\frac{1}{2} ∥ w ∥^{2}$ 。

最终分类器只由支持向量决定，因为支持向量是距离超平面最近、满足等号约束的样本。非支持向量离边界更远，对最优间隔没有直接影响，即使轻微移动也通常不会改变最终超平面。

Question 2: Generative vs Discriminative Models

Question:
Consider a binary classification problem with features $x \in R^{d}$ .

Write the Naive Bayes assumption and its classification rule.
Show that Gaussian NB with equal class variances leads to a linear decision boundary.
Write the Logistic Regression model form and its decision boundary.
Compare NB and Logistic Regression: What are the differences in learning objectives? Which is preferred when the training data is small? Large? Why?

Answer:
Naive Bayes 假设在给定类别 $y$ 的条件下，各个特征相互独立：

P (x ∣ y) = j = 1 \prod d P (x_{j} ∣ y) .

分类规则是选择后验概率最大的类别：

\overset{y}{^} = ar g y max P (y) j = 1 \prod d P (x_{j} ∣ y) .

如果采用 Gaussian NB，并且不同类别下每个特征具有相同方差，则对数后验比中的二次项会相互抵消，剩下关于 $x$ 的一次项。因此决策边界可以写成

w^{T} x + b = 0,

所以是线性边界。

Logistic Regression 直接建模条件概率：

P (y = 1∣ x) = σ (w^{T} x + b),

其中 $σ (z) = \frac{1}{1 + e ^{- z}}$ 。其决策边界通常为

w^{T} x + b = 0.

NB 是生成式模型，学习 $P (x ∣ y)$ 和 $P (y)$ ；Logistic Regression 是判别式模型，直接学习 $P (y ∣ x)$ 。数据较少时，NB 常更合适，因为其独立性假设降低了参数估计难度；数据较多时，Logistic Regression 通常更好，因为它不需要强独立性假设，能更灵活地拟合决策边界。

Question 3: Ensemble Methods

Question:

Explain the core idea of Bagging. Does it primarily reduce bias or variance? Why?
Describe the basic workflow of Boosting, such as AdaBoost.
Explain why Boosting reduces bias, and why Bagging is suitable for high-variance models like decision trees.

Answer:
Bagging 的核心思想是从原始训练集多次有放回采样，训练多个基模型，然后通过投票或平均得到最终预测。它主要降低方差，因为多个模型的预测误差可以相互抵消，使整体结果更稳定。

Boosting 的基本流程是按顺序训练多个弱分类器。每一轮都会提高前一轮分错样本的权重，使后续模型更关注难分样本。最终将多个弱分类器加权组合，得到强分类器。

Boosting 能降低偏差，因为它逐步修正前一轮模型的错误，使模型表达能力不断增强。Bagging 适合决策树这类高方差模型，因为单棵树对训练数据变化很敏感，而多棵树平均后可以显著降低这种不稳定性。

Coding Problem: Gaussian Naive Bayes

Question:
Implement a Gaussian Naive Bayes classifier from scratch using NumPy.

In fit, estimate the class prior log probability $lo g P (y = c)$ , the per-feature mean $μ_{j c}$ , and the unbiased variance $σ_{j c}^{2}$ for each class.
In predict_log_proba, compute the log unnormalized posterior:

score (c) = lo g P (y = c) + j = 1 \sum d [- \frac{1}{2} lo g (2 π σ_{j c}^{2}) - \frac{( x _{j} - μ _{j c} ) ^{2}}{2 σ _{j c}^{2}}] .

Return the class with the highest score and compare the result with sklearn.naive_bayes.GaussianNB.

Answer:
代码实现：正确率：88.75% 与scikit-learn对比：

Starry's Blog

Explorer

assignment1

Question 1: Geometric Understanding of SVM

Question 2: Generative vs Discriminative Models

Question 3: Ensemble Methods

Coding Problem: Gaussian Naive Bayes

Graph View

Table of Contents