(c) Naren Ramakrishnan and the students of CS6604, Spring 2001. Permission to use ideas about
the organization of topics, slides, and discussion notes is granted, provided suitable
acknowledgements and citations are made.
CS 6604 Lectures
An inclusion of a paper in the reading list does not constitute endorsement by the instructor. Outlines and topics are tentative and
subject to being pushed around.
Jan 15: [Introduction Slides], Strand Diagram, and
Basic Dichotomies of Recommender Systems. Reading assignment:
Take a peek at the Communications
of the ACM March 1997 and Aug 2000 issues and classify the
systems there according to the various dimensions induced by the dichotomies.
The Modeling Dichotomy
Jan 17:
Review of IR perspectives. Basic problems of recommendation.
Details of content-based and collaborative approaches. Examples from
search engines. Endemic problems
with ratings and evaluations. [Slides]
Jan 19: Reading Assignments: [Discussion Notes]
P. Resnick and H. Varian, Recommender Systems,
Communications of the ACM, Vol. 40, No. 3, pages 56-58,
March 1997. [Read all Rec papers from this issue]
Jan 22: Evaluation and comparison of collaborative filtering
algorithms. [Discussion Notes]
J. Breese, D. Heckerman, and C. Kadie,
Empirical Analysis of Predictive Algorithms for Collaborative
Filtering,
Proceedings of the Fourteenth Annual Conference on Uncertainty in
Artificial Intelligence,
pages 43-52, Morgan Kaufmann, July 1998.
K. Goldberg, T. Roeder, D. Gupta, and C. Perkins,
Eigentaste: A Constant Time Collaborative Filtering Algorithm,
Technical Report M00/41, Electronic Research Laboratory, University of California, Berkeley,
August 2000.
C.C. Aggarwal, J. Wolf, K. Wu and P. Yu,
Horting Hatches
an Egg: A Graph-Theoretic Approach to Collaborative
Filtering,
Proceedings of the Fifth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining,
pages 201-212, ACM Press, August 1999.
Jan 24: The use of data mining and machine learning techniques to learn
mappings and internal representations. Implications for maintaining and updating
mappings, with dynamic data. How choice of the technique affects (unfortunately)
evaluation criteria. Explainability and believability of recommendations.
Motivations from PYTHIA.
We will survey all the articles
we have seen so far from these perspectives.
Targeting a Recommender System
Jan 26: Options and opportunities. Reading assignments from the Aug 2000 CACM (place the various other systems we have seen so far in this context). Exercise: Find 5-10 web sites that target customers at various levels of
the targeting dichotomy (see first day's slides for more info). [Discussion Notes]
U. Manber, A. Patel, and J. Robison,
The Business of Personalization: Experience with Personalization of Yahoo!,
Communications of the ACM, Vol. 43, No. 8,
pages 35-39, August 2000.
M. Pazzani, K. Muramatsu, and D. Billsus,
Syskill and Webert:
Identifying Interesting Web Sites,
In Proceedings of the Thirteenth National Conference on Artificial Intelligen
ce,
pages 54-61, Portland, OR, August 1996.
M. Perkowitz and O. Etzioni,
Adaptive Web Sites,
Communications of the ACM,
Vol. 42, No. 8, pages 152-158, 2000.
Jan 29: How data mining algorithms and techniques influence (and are
influenced by) targeting dichotomies. The role of clustering in recommender
systems. Can recommender systems be designed independently of the
decided level of targeting? [Discussion Notes]
Jan 31: Improving targeting by observing browsing behavior. One example
is the following reading assignment. Exercise: Identify 2-3 sites that
use indicators such as these to improve their targeting. [Discussion Notes]
The Matview
Feb 2: On why everything is a matrix. Things that can be done with
matrices. Connections with age-old IR research. Review of linear algebra and
algorithmics. [Discussion Notes]
Feb 5: Introduction to Latent Semantic Indexing [Discussion Notes]:
M.W. Berry, S.T. Dumais, and G.W. O'Brien,
Using Linear Algebra for Intelligent Information Retrieval,
SIAM Review, Vol. 37, No 4, pages 573-595, 1995. Relevant Sites:
LSI at Telecordia, Berry's LSI Page
For a historical perspective on matrix decompositions, see:
G.W. Stewart,
The Decompositional Approach to Matrix Computation,
IEEE/AIP Computing in Science and Engineering,
Vol. 2, No. 1, pages 50-59, January/February 2000.
Feb 7: Minor tweaks to this idea [Discussion Notes]:
A. Booker et al., Visualizing Text Datasets,
IEEE Computing in Science and Engineering,
Vol. 1, No. 4, pages 26-34, July/August 1999.
T.G. Kolda, and D.P. O'Leary,
A Semidiscrete Matrix Decomposition for Latent Semantic Indexing
in Information Retrieval,
ACM Transactions on Information Systems,
Vol. 16, No. 4, pages 322-346, 1998.
Feb 9: Generalizations of the idea
[Discussion Notes]:
Feb 12: Interesting Variations [Discussion Notes]:
D. D. Lee and H. S. Seung. Learning the Parts of Objects by Non-Negative Matrix
Factorization, Nature, Vol 401, pages 788-791, 1999. Not available electronically (I think), will
hand out copies in class. A web-tutorial on learning dynamic systems is available that
covers a variety of pertinent algorithms, such as SVD, EM, and neural networks.
A discussion site for this paper is also online.
Feb 14: Eigenvectors in the real-world [Discussion Notes]:
S. Chakraborti, B.E. Dom, S. Ravi Kumar,
P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson,
and J. Kleinberg,
Mining the Web's Link Structure,
IEEE Computer, Vol. 32, No. 8, pages 60-67, August 1999. This paper
describes the algorithm behind the much-acclaimed CLEVER project
at IBM Almaden.
The Graphview
Feb 16: On why everything is a graph. Things that can be done with graphs.
Graph perspectives in recommender systems.
Details of this strand of research [Discussion Notes].
Feb 19: Mining for graph-based communities (this is really an expansion
of the sidebar from Feb 14's reading) [Discussion Notes]:
Feb 21: Modeling small world networks [Discussion Notes]:
D. Watts and S. Strogatz,
Collective Dynamics of "Small-World" Networks,
Nature,
Vol. 393, No. 6, pages 440-442, June 1998.
L. Adamic, The Small World Wide Web, URL: http://www.parc.xerox.com/istl/groups/iea/www/smallworldpaper.html.
Optional Reading:
J. Kleinberg, The Small-World Phenomenon: An Algorithmic Perspective,
Nature, 2000.
Feb 23: More about small world networks [Discussion Notes]:
R. Albert, H. Jeong, and A.-L. Barabási,
Diameter of the World-Wide Web,
Nature, Vol. 401, pages 130-131, 1999.
L.A.N. Amaral, A. Scala, M. Barthelemy, and H.E. Stanley,
Classes of Behavior of Small-World Networks, cond-mat/0001458, January 2000.
Feb 26: Mapping the Web: Read the following two papers and determine how/if search
engines could exploit the information mined from the first study. [Discussion Notes]
A. Broder et al.,
Graph Structure in the
Web, In Proceedings of the International World Wide Web Conference, 1999.
An analysis of the coverage of search engines:
S. Lawrence and C. Lee Giles,
Searching the World Wide Web,
Science,
Vol. 280, No. 5360, pages 98-100, 1998.
Optional Reading:
An analysis of link analysis used in search engines:
M. Henzinger, Hyperlink Analysis for the Web,
IEEE Internet Computing, pages 45-50, Jan-Feb 2001 (we have covered
most of this already).
Feb 28: Applications of Graph Theory in Recommender Systems - An
example of mining, modeling, and exploiting. I will describe Batul Mirza's
thesis research.
Midterm Class Presentations
Mar 2: Class Presentations.
SPRING BREAK! :) :) :)
Mar 12: Class Presentations (contd.).
Content Modeling, Information Integration, and Interaction
Mar 14: Introduction to this strand. Overview of content modeling, web data extraction,
information integration. A good tutorial on content modeling (some topics only), pertaining to web-DB integration,
is available in:
Mar 16: Modeling web resources [Discussion Notes]:
C.A. Knoblock, S. Minton, J.L. Ambite, N. Ashish, P.J. Modi, I. Muslea, A.G. Philpot,
and S. Tejada, Modeling Web Sources for
Information Integration,
Proceedings of AAAI'98, 1998.
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam,
and S. Slattery,
Learning to Extract Symbolic Knowledge from the World Wide Web,
Proceedings of AAAI'98, 1998.
Mar 19: Mining Semistructure [Discussion Notes]:
S. Nestorov, S. Abiteboul, and R. Motwani,
Extracting Schema from Semistructured Data,
In Proc. ACM SIGMOD, 1998.
M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim,
XTRACT: A System for Extracting Document Type Descriptors from XML Documents,
Proc. ACM SIGMOD, 2000.
Mar 21: More on Content Modeling:
Learning to correct for single-spelling errors (e.g. in search engine queries).
A presentation by Rob Capra.
Mar 23: Contextual Abstractions [Discussion Notes]:
S. Lawrence,
Context in Web Search,
IEEE Data Engineering Bulletin,
Volume 23, Number 3, pages 25-32, September 2000.
P. Pirolli, J. Pitkow, and R. Rao,
Silk from a Sow's Ear: Extracting Usable Structures from the Web,
In Proc. CHI'96, 1996.
Mar 26: Modeling Interaction [Discussion Notes]:
Mar 28: Laws of Surfing and their Uses:
B.A. Huberman, P. Pirolli, J. Pitkow, and R.J. Lukose,
Strong Regularities in World Wide Web Surfing,
Science, Vol. 280, pages 95-97, 1998.
Optional Reading:
Z. Zhu, J. Yu, and J. Doyle, Heavy Tails, Generalized Coding, and
Optimal Web Layout, Proceedings of IEEE INFOCOMM, 2001.
Mar 30: Task-Based System Designs:
W.W. Cohen, A. McCallum, and D. Quass,
Learning to Understand the Web,
IEEE Data Engineering Bulletin,
Volume 23, Number 3, pages 17-24, September 2000.
M. Hearst,
Next Generation Web Search: Setting Our Sites,
IEEE Data Engineering Bulletin,
Volume 23, Number 3, pages 38-48, September 2000.
Apr 2: Integrated Approaches to Building Hot-Rods:
Transcoding, Intermediaries, and Functional Indirection
Apr 4: Introduction to the role and nature of intermediaries on the web.
A good starting point is this (rather industry-ish) IBM article. Think
specifically on the role of recommender systems as intermediaries in a
larger personalization context.
Apr 6: Indirection as a design principle. A good example is the recently
proposed solution for broken hyperlinks (again, relate this back to
recommender systems):
T.A. Phelps and R. Wilensky, Robust Hyperlinks Cost Just Five Words Each,
UC Berkeley CS Technical Report UCB-CSD-00-1091, 2000.
T.A. Phelps and R. Wilensky, Robust Hyperlinks: Cheap, Everywhere, Now, In
Proceedings of Digital Documents and Electronic Publishing (DDEP00), Munich,
Germany, 13-15 September 2000.
Browse also through their project page.
Apr 9: Personalization as a necessary ingredient in mobile systems:
Apr 11: Location-sensitive Personalization (e.g., recommending a
McDonalds near your current location, etc.). I have been unable to find good
technical papers that describe this topic. Here's a description of a project
that can be discussed in class:
SAGRES, at U. Washington, Seattle.
For some background on "geographical computing", see:
T. Imielinski and J.C. Navas, GPS-based Geographic Addressing, Routing,
and Resource Disovery, Communications of the ACM, Vol. 42, No. 4,
pages 86-92, April 1999.
Apr 13: Standards and Conventions for Large-Scope Solutions. We
will look at words that end with F, capability description standards,
and the role that "personalization
protocols" can play. Will also throw in Ethics, Privacy, and Business Stuff
(ideally, they deserve their own classes, but we are running out of slots).
Some references for standards:
I. Cingil, A. Dogac, and A. Azgin,
A Broader Approach to Personalization,
Communications of the ACM, Vol. 43, No. 8, pages 136-141, 2000.
CC/PP: A User Side Framework for Enhanced Content Negotiation,
W3C Working Draft, Jan 2001.
Misc.
Apr 16: Detailed Project Presentations Start! :)
|