Created by 马明
Basic assumption
What LSA can do?
The key is Find the Latent Variable!!!
Recall TF-IDF
We can get matrix A for words to documents, every cell is the tf-idf score
Matrix A is sparse and noisy, we need dimensionality reduction on A
$$M=U\Sigma V^T$$
Truncated SVD
With the document vectors and term vectors, we can apply measures such as cosine similarity to evaluate:
Determining optimum number of topics
widely used metric to evaluate topic models
find average/median of pairwise word similarity scores of the words in a topic
LSA is quick and efficient to use, but also with drawbacks:
pLSA, or Probabilistic Latent Semantic Analysis, uses a probabilistic method instead of SVD
The core idea is to find a probabilistic model with latent topics that can generate the data we observe in our document-term matrix
$P(D,W)$ for any document $d$ and word $w$, $P(D,W)$ corresponds to that entry in the document-term matrix
The joint probability of seeing a given document and word together:
$$P(D,W)=P(D)\sum_{Z}P(Z|D)P(W|Z)$$
$P(D), P(Z|D), and\ P(W|Z)$ are parameters
$P(D)$ can be determined directly from our corpus
$P(Z|D)\ and\ P(W|Z)$ are modeled as multinomial distributions, and can be trained using the expectation-maximization algorithm (EM)
$$P(D,W)=\sum_{Z}P(Z)P(D|Z)P(W|Z)$$
Relation between pLSA and LSA
Because we have no parameters to model $P(D)$ , we don’t know how to assign probabilities to new documents.
The number of parameters for pLSA grows linearly with the number of documents we have, so it is prone to overfitting
LDA stands for Latent Dirichlet Allocation
LDA is a Bayesian version of pLSA
Dirichlet priors for the document-topic and word-topic distributions, lending itself to better generalization
Each document can be described by a distribution of topics
Each topic can be described by a distribution of words
How can you understand what category each document belongs to?
500*1000 = 500,000 threads
What is the Dirichlet Distribution
How LDA imagine documents are generated?
$$P(\theta_{1:M},z_{1:M},\beta_{1:k}|D;\alpha_{1:M},\phi_{1:k})$$
Variational inference
Approximate the messy intractable posterior with some known probability distribution that closely matches the true posterior
minimise the KL divergence
$\gamma^*,\phi^*,\lambda^*=argmin_{\gamma,\phi,\lambda}D(q(\theta,z,\beta|\gamma,\phi,\lambda)||p(\theta,z,\beta|D,\alpha,\eta))$
lda2vec is an extension of word2vec and LDA that jointly learns word, document, and topic vectors.
word vector = skip-gram word2vec model
document weight vector: representing the “weights” of each topic in the document
topic matrix: representing each topic and its corresponding vector embedding
Write content using inline or external Markdown. Instructions and more info available in the readme.
<section data-markdown>
## Markdown support
Write content using inline or external Markdown.
Instructions and more info available in the [readme](https://github.com/hakimel/reveal.js#markdown).
</section>
Hit the next arrow...
... to step through ...
... a fragmented slide.
There's different types of fragments, like:
grow
shrink
fade-out
fade-up (also down, left and right!)
current-visible
Highlight red blue green
You can select from different transitions, like:
None -
Fade -
Slide -
Convex -
Concave -
Zoom
reveal.js comes with a few themes built in:
Black (default) -
White -
League -
Sky -
Beige -
Simple
Serif -
Blood -
Night -
Moon -
Solarized
Set data-background="#dddddd"
on a slide to change the background color. All CSS color formats are supported.
<section data-background="image.png">
<section data-background="image.png" data-background-repeat="repeat" data-background-size="100px">
<section data-background-video="video.mp4,video.webm">
Different background transitions are available via the backgroundTransition option. This one's called "zoom".
Reveal.configure({ backgroundTransition: 'zoom' })
You can override background transitions per-slide.
<section data-background-transition="zoom">
function linkify( selector ) {
if( supports3DTransforms ) {
var nodes = document.querySelectorAll( selector );
for( var i = 0, len = nodes.length; i < len; i++ ) {
var node = nodes[i];
if( !node.className ) {
node.className += ' roll';
}
}
}
}
Code syntax highlighting courtesy of highlight.js.
Item | Value | Quantity |
---|---|---|
Apples | $1 | 7 |
Lemonade | $2 | 18 |
Bread | $3 | 2 |
These guys come in two forms, inline: The nice thing about standards is that there are so many to choose from
and block:
“For years there has been a theory that millions of monkeys typing at random on millions of typewriters would reproduce the entire works of Shakespeare. The Internet has proven this theory to be untrue.”
You can link between slides internally, like this.
There's a speaker view. It includes a timer, preview of the upcoming slide as well as your speaker notes.
Press the S key to try it out.
Presentations can be exported to PDF, here's an example:
Set data-state="something"
on a slide and "something"
will be added as a class to the document element when the slide is open. This lets you
apply broader style changes, like switching the page background.
Additionally custom events can be triggered on a per slide basis by binding to the data-state
name.
Reveal.addEventListener( 'customevent', function() {
console.log( '"customevent" has fired' );
} );
Press B or . on your keyboard to pause the presentation. This is helpful when you're on stage and want to take distracting slides off the screen.