Overview on Clustering Analysis

Clustering is a Machine Learning technique that involves the grouping of data points
In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features
Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields
We can use clustering analysis to gain some valuable insights from our data

How to evaluate a cluster algorithm?

Internal measures for cluster validation
- Silhouette coefficient
- Dunn index
External measures for clustering validation

核心思想：类内距离尽可能小；类间距离近可能大

大部分评价方法主要衡量三个方面：

Compactness
Separation
Connectivity

$Index = \frac{(\alpha \times Separation)}{(\beta \times Compactness)}$

Silhouette coefficient

The silhouette analysis measures how well an observation is clustered and it estimates the average distance between clusters

For each observation $i$ , the silhouette width $s_i$ is calculated as follows:

For each observation $i$ , calculate the average dissimilarity $a_i$ between $i$ and all other points of the cluster to which $i$ belongs
For all other clusters $C$ ,to which $i$ does not belong, calculate the average dissimilarity $d(i,C)$ of $i$ to all observations of $C$ .The smallest of these $d(i,C)$ is defined as $b_i= \min_C d(i,C)$. The value of $b_i$ can be seen as the dissimilarity between $i$ and its “neighbor” cluster
Finally the silhouette width of the observation $i$ is defined by the formula: $S_i = (b_i - a_i)/max(a_i, b_i)$

Dunn index

For each cluster, compute the distance between each of the objects in the cluster and the objects in the other clusters

Use the minimum of this pairwise distance as the inter-cluster separation (min.separation)
For each cluster, compute the distance between the objects in the same cluster.
Use the maximal intra-cluster distance (i.e maximum diameter) as the intra-cluster compactness
Calculate the Dunn index (D) as follow: $D = \frac{min.separation}{max.diameter}$

How to choose $k$

Elbow method
Average silhouette method
Gap statistic method

Elbow method

Looks at the total WSS(Within-cluster Sum of Square) as a function of the number of clusters
The location of a bend (knee) in the plot is generally considered as an indicator of the appropriate number of clusters

Average silhouette method

Average silhouette method computes the average silhouette of observations for different values of $k$
The optimal number of clusters $k$ is the one that maximize the average silhouette over a range of possible values for $k$

Gap statistic method

Cluster the observed data, varying the number of clusters from $k = 1, ... , k_{max}$, and compute the corresponding total within intra-cluster variation $W_k$.
Generate B reference data sets with a random uniform distribution. Cluster each of these reference data sets with varying number of clusters $k = 1, ... , k_{max}$, and compute the corresponding total within intra-cluster variation $W_{kb}$
Compute the estimated gap statistic as the deviation of the observed $W_k$ value from its expected value $W_{kb}$ under the null hypothesis: $Gap(k) = \frac{1}{B} \sum\limits_{b=1}^B log(W_{kb}^*) - log(W_k)$
Choose the number of clusters as the smallest value of $k$ such that the gap statistic is within one standard deviation of the gap at $k+1$ : $Gap(k)\ge Gap(k+1)-S_{k+1}$

Markdown support

Write content using inline or external Markdown. Instructions and more info available in the readme.

<section data-markdown>
  ## Markdown support

  Write content using inline or external Markdown.
  Instructions and more info available in the [readme](https://github.com/hakimel/reveal.js#markdown).
</section>

Transition Styles

You can select from different transitions, like:
None - Fade - Slide - Convex - Concave - Zoom

Themes

reveal.js comes with a few themes built in:
Black (default) - White - League - Sky - Beige - Simple
Serif - Blood - Night - Moon - Solarized

Pretty Code

function linkify( selector ) {
  if( supports3DTransforms ) {

    var nodes = document.querySelectorAll( selector );

    for( var i = 0, len = nodes.length; i < len; i++ ) {
      var node = nodes[i];

      if( !node.className ) {
        node.className += ' roll';
      }
    }
  }
}

Code syntax highlighting courtesy of highlight.js.

Marvelous List

No order here
Or here
Or here
Or here

Fantastic Ordered List

One is smaller than...
Two is smaller than...
Three!

Tabular Tables

Item	Value	Quantity
Apples	$1	7
Lemonade	$2	18
Bread	$3	2

Clever Quotes

These guys come in two forms, inline: The nice thing about standards is that there are so many to choose from and block:

“For years there has been a theory that millions of monkeys typing at random on millions of typewriters would reproduce the entire works of Shakespeare. The Internet has proven this theory to be untrue.”

Intergalactic Interconnections

You can link between slides internally, like this.

Speaker View

There's a speaker view. It includes a timer, preview of the upcoming slide as well as your speaker notes.

Press the S key to try it out.

Export to PDF

Presentations can be exported to PDF, here's an example:

Global State

Set data-state="something" on a slide and "something" will be added as a class to the document element when the slide is open. This lets you apply broader style changes, like switching the page background.

State Events

Additionally custom events can be triggered on a per slide basis by binding to the data-state name.

Reveal.addEventListener( 'customevent', function() {
	console.log( '"customevent" has fired' );
} );

Take a Moment

Press B or . on your keyboard to pause the presentation. This is helpful when you're on stage and want to take distracting slides off the screen.

Much more

THE END

- Try the online editor
- Source code & documentation

Clustering

Analysis

Content

Initial view on clustering

Overview on Clustering Analysis

How to choose $k$

Hierarchical clustering

Density-Based clustering

Fuzzy clustering

Grid-based clustering algorithms

Spectral Clustering

Spectral Clustering VS Kmeans

图论基本知识

Spectral Clustering

Intuition behind Spectral Clustering