Discussion Notes
Feb 09, 2001
(courtesy Saverio Perugini)
Unification of IR Models
- VSM:
Models only direct similarities, does not bring
disconnected portions together, since no modeling of term-term
correlations is made.
- GVSM: From the SVD, we know that the U matrix models term-term
correspondences and the V matrix models document-document
correspondences. GVSM is just a way of using term-term correspondences
explicitly, either by using the original matrix A, or by another
user-supplied lexicon.
- LSI: Models indirect similarities, only brings disconnected
portions together in one special, negligible case (a guaranteed "A" in
the course for the first person who brings this pathological
case to my attention *and* proves this property of LSI formally).
- VSM and GVSM not typically cast as matrix decompositions;
Fang and Littman appear to have made this connection. Typically,
we vary the left and right matrices of a decomposition and cast the
diagonal matrix (middle) as dependent on the constraints (see, for example the SDD
paper). In this paper, the authors
freeze the left and right matrices and show how (the middle
part) is to be
varied (i.e., the weightage assigned to each dimension).
- The rank is not reduced by their manuevering of ADE: their metric appears
to be precision and not really reducing dimensionality.
- Connections to cross-language IR:
Normally the document space is not "doubled", since each document has a one-one
correspondence to a counterpart in the other language. The terms are now
doubled in number, to account for matchings in two different languages.
To see why ADE (and hence, props of LSI) are really necessary:
consider an English, German, and French collection, given by:
Traditionally, since GVSM only makes one-step connections, the French
documents will not be retrieved for English, and vice versa. ADE is expected
to bring the French and English documents together. Procrustes is the name
given to a greek agent who manuevers things into "fitting." In this
application, the mappings to German (from English) and from German (to
French) are being procrustesed (at the German end of the stick).