Discussion Notes

Feb 19, 2001

(courtesy Saverio Perugini)

Trawling the web for emerging cyber-communities

  • an example of mining for structure - bipartite graph / core
  • their fans are specialized hubs and centers are authorities
  • a lot of preprocessing / data cleaning up front, heuristics
  • organized data (sequences) such that main memory was enough
  • Descriptive versus constructive view of algorithms: random graph models allow us to quantify the probability of finding some type of structure, or the resources needed by an algorithm to find it.
  • Observe the smoke signal effect in the power-law graphs: solution is to use cumulative frequency distribution rather than raw frequency distribution
  • Pruning criterion / property: Useful for constraining search. Here, the idea that if an itemset does not have the property (support), then no superset of it can have the property.
  • read the CACM article on `Discovering Shared Interests by Graph Analysis,' a way ahead of its time paper that addresses more or less the same ideas
  • Prelude to next class: observation based systems - can one figure out what a system (e.g. search engine) is doing by simply issuing queries to it?


Return Home