Hughes Phenomenon
想象我们需要建立一个分辨猫和狗的图片分类器
把特征增加到3维
分析
Get sparser and sparser
Overfitting and Bestfitting
Simple classifier generalizes much better to unseen data because it did not learn specific exceptions that were only in our training data by coincidence
By using less features, the curse of dimensionality was avoided such that the classifier did not overfit the training data
More explaination
If we keep adding dimensions, the amount of training data needs to grow exponentially fast to maintain the same coverage and to avoid overfitting
Deep further
Training samples that fall outside the unit circle are in the corners of the feature space and are more difficult to classify than samples near the center of the feature space.
How the volume of the circle (hypersphere) changes relative to the volume of the square (hypercube) when we increase the dimensionality of the feature space.
$$V(d)=\frac{\pi^{d/2}}{\Gamma(\frac{d}{2}+1)}0.5^d$$
With $radius=0.5$
The volume of the hypersphere tends to zero as the dimensionality tends to infinity
For an 8-dimensional hypercube, about 98% of the data is concentrated in its 256 corners.
The ratio of the difference in minimum and maximum Euclidean distance from sample point to the centroid, and the minimum distance itself, tends to zero: $$\lim_{n\to\infty}\frac{dist_{max}-dist_{min}}{dist_{min}}\to 0$$