CS 6604: Fall 2013
Data Mining Large Networks
and TimeSeries
Lecture Slides and Readings

08/26: Introduction [SLIDES]
Readings:

08/28: More Graph Properties and the Webgraph [SLIDES]
Readings:
 Chapter 2 from EK: Graphs
 A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J. Wiener. Graph Structure of the Web. WWW, 2000.
Optional Readings:

09/02: The ErdosRenyi Model and Small Worlds [SLIDES]
Readings:
Optional Readings:
 P. Erdos, A. Renyi. On Random Graphs I. Publ. Math. Debrecen, 1959.
 B. Bollobas. Random Graphs. Cambridge University Press.
 M. E. J. Newman, S. H. Strogatz and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64, 026118, 2001.
 S. Milgram. The small world problem. Psychology Today 1(1967).
 J. Travers and S. Milgram. An experimental study of the small world problem. Sociometry 32, 1969.
 J. Kleinfeld. Could it be a Big World After All? The `Six Degrees of Separation' Myth. Society, 2002.
 P. S. Dodds, R. Muhamad, D. J. Watts. An Experimental Study of Search in Global Social Networks. Science 301(2003), 827.

09/04: The WattsStrogatz model and Decentralized Search [SLIDES]
Readings:
Optional Readings:
 J. Kleinberg. The smallworld phenomenon: An algorithmic perspective. STOC, 2000.
 M. E. J. Newman. Models of the Small World: A Review., J. Stat. Physics 2000.
 D. J. Watts, P. S. Dodds, M. E. J. Newman. Identity and Search in Social Networks. Science, 296, 13021305, 2002.
 L. A. Adamic, E. Adar. How to search a social network. Social networks, 27 3, 187203, 2005
 L. A. Adamic, R. M. Lukose, A. R. Puniyani, B. A. Huberman. Search in PowerLaw Networks. Phys. Rev. E, 64 46135, 2001.
 D. LibenNowell, J. Novak, R. Kumar, P. Raghavan, A. Tomkins. Geographic routing in social networks. Proc. Natl. Acad. Sci., 102, 2005.
 H. Balakrishnan, M.F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Looking up data in P2P systems. Communications of the ACM 46:4348, February 2003.

09/09: Power Laws, Preferential Attachment and Fractals [SLIDES]
Readings:
Optional Readings:
 C. Faloutsos and I. Kamel. Beyond Uniformity and Independence: Analysis of RTrees Using the Concept of Fractal Dimension. PODS, 1994.
 M. Schroeder. Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. Dover Publications, 2009.
 C. Andersen. The Long Tail. WIRED Magazine, October 2004.
 A. Clauset, C. R. Shalizi, and M. E. J. Newman. Powerlaw distributions in empirical data. SIAM Review 51(4), 661703, 2009.
 M. Mitzenmacher. A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics, vol 1, No. 2, pp. 226251, 2004.
 M. E. J. Newman. Power laws, Pareto distributions and Zipf's law. Contemporary Physics 46(5), 323351, 2005.
 H. A. Simon. On a class of skew distribution functions. Biometrika 42, 425440, 1955
 D. de S. Price. A general theory of bibliometric and other cumulative advantage processes. J. Amer. Soc. Inform. Sci. 27: 292306, 1976.
 R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. Stochastic models for the Web graph. In Proc. FOCS 2000.
 D. Pennock, G. Flake, S. Lawrence, E. Glover, C. Lee Giles. Winners don't take all: Characterizing the competition for links on the web. PNAS 99(8), 2002.
 N. Berger, C. Borgs, J. Chayes, R. D'Souza, R. Kleinberg. CompetitionInduced Preferential Attachment. ICALP 2004.
 B. Bollobas, C. Borgs, J. Chayes, O. Riordan. Directed scalefree graphs. In Proc. SODA 2003.
 S. Goel, A. Broder, E. Gabrilovich, B. Pang. Anatomy of the Long Tail: Ordinary People with Extraordinary Tastes. WSDM, 2010.
 M. E. J. Newman. The firstmover advantage in scientific publication. European Physics Letters 86, 68001, 2009.

09/11: Previous Lecture Contd.

09/16: Hadoop, and Graph Analysis [SLIDES]
Readings:
Optional Readings:
 D. DeWitt and M. Stonebraker. MapReduce: A major step backwards. MapReduce: A major step backwards.. Blog post, 2008.
 U Kang, C. E. Tsourakakis, A. P. Appel, C. Faloutsos, and J. Leskovec. HADI: Mining Radii of Large Graphs . TKDD, 2011.
 G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser and G. Czajkowski. Pregel: A System for LargeScale Graph Processing. SIGMOD, 2010.
 Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A New Parallel Framework for Machine Learning. UAI, 2010.
 X. Hu, Y. Tao, CW. Chung. Massive Graph Triangulation. SIGMOD, 2013.
 U Kang and C. Faloutsos. Beyond `Caveman Communities': Hubs and Spokes for Graph Compression and Mining. ICDM, 2011.
 R. Xin, J. Rosen, M. Zaharia, M. Franklin, S. Shenker, and I. Stoica. Shark: SQL and Rich Analytics at Scale. SIGMOD, 2013.

09/18: Epidemics: Probabilistic Models [SLIDES]
Readings:
Optional Readings:
 H. W. Hethcote. The Mathematics of Infectious Diseases. SIAM Review, 2000.
 J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SDM, 2007.
 D. Gruhl, R. Guha, D. LibenNowell, A. Tomkins. Information Diffusion through Blogspace. WWW, 2004.
 J. Ugander, L. Backstrom, C. Marlow, J. Kleinberg. Structural Diversity in Social Contagion. PNAS, 2012.
 P. S. Dodds and D. J. Watts. Universal behavior in a generalized model of contagion. PNAS, 2005.
 L. Backstrom, D. Huttenlocher, J. Kleinberg, X. Lan. Group normation in Large Social Networks: Membership, Growth, and Evolution. KDD, 2006.
 J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. SP, 1993.
 A. Anagnostopoulos, R. Kumar, M. Mahdian. Influence and correlation in social networks. SIGKDD, 2008.
 S. Aral, L. Muchnik, A. Sundararajan. Distinguishing inﬂuencebased contagion from homophilydriven diffusion in dynamic networks. PNAS, 2009.

09/23: Epidemics: Thresholds [SLIDES]
Readings:
Optional Readings:
 A. G. McKendrick. Applications of mathematics to medical problems. In Proceedings of Edin. Math. Society, volume 14, pages 98–130, 1926.
 M. E. J. Newman. Spread of epidemic disease on networks. In Phys. Rev. E, 66(1):016128, 2002.
 R. PastorSantorras and A. Vespignani. Epidemic spreading in scalefree networks. In Physical Review Letters 86, 14, 2001.
 R. PastorSatorras and A. Vespignani. Epidemic dynamics in finite size scalefree networks. In Physical Review E, 65:035108, 2002.
 A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology on the spread of epidemics. In INFOCOM, 2005.
 D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in real networks. In ACM TISSEC, 10(4), 2008.
 N. Valler, B. A. Prakash, H. Tong, M. Faloutsos and C. Faloutsos. Epidemic Spread in Mobile Ad Hoc Networks: Determining the Tipping Point. In IFIP Networking, 2011.
 M. C. Gonzalez, C. A. Hidalgo, and A.L. Barabasi. Understanding individual human mobility patterns. In Nature, 2008.

09/25: Epidemics: Competing Viruses [SLIDES]
Readings:
Optional Readings:
 M. E. J. Newman. Threshold effects for two pathogens spreading on a network. Phys. Rev. Lett., 2005.
 N. Pathak, A. Banerjee, and J. Srivastava. A Generalized Linear Threshold Model for Multiple Cascades. ICDM, 2010.
 L. Weng, A. Flammini, A. Vespignani, and F. Menczer. Competition among memes in a world with limited attention. Nature, 2012.
 T. Antunovic, E. Mossel, and M. Z. Racz. Coexistence in preferential attachment networks. UCBerkeley Tech Report, 2013.
 S. Goyal, and M. Kearns. Competitive Contagion in Networks. STOC, 2012.
 D. Koutra, V. Koutras, B. A. Prakash, and C. Faloutsos. Patterns amongst Competing Task Frequencies: SuperLinearities, and the AlmondDG model. PAKDD, 2013.

09/30: Previous Lecture Contd.

10/02: Immunization in Networks [SLIDES]
Readings:
 R. Albert, H. Jeong and AL. Barabasi. Error and attack tolerance of complex networks. Nature, 2000.
 H. Tong, B. A. Prakash, T. EliassiRad, M. Faloutsos and C. Faloutsos. Gelling, and Melting, Large Graphs through Edge Manipulation. CIKM, 2012.
 B. A. Prakash, L. Adamic, T. Iwashnya, H. Tong and C. Faloutsos. Fractional Immunization on Networks. SDM, 2013.
Optional Readings:
 R. Cohen, S. Havlin, and D. ben Avraham. Efficient immunization strategies for computer networks and populations. Physical Review Letters, 2003.
 R. PastorSatorras, A. Vespignani. Immunization of complex networks. Physical Review E, 2002.
 N. Madar, T. Kalisky, R. Cohen, D. BenAvraham, S. Havlin. Immunization and epidemic dynamics in complex networks. European Physical Journal B, 2004.
 R. Cohen, K. Erez, D. BenAvraham, S. Havlin. Resilience of the internet to random breakdowns. Physical Review Letters, 2000.
 L. Briesemeister, P. Lincoln, P. Porras. Epidemic profiles and defense of scalefree networks. WORM 2003.
 J. Aspnes, K. L. Chang, A. Yampolskiy. Inoculation strategies for victims of viruses and the sumofsquares partition problem. SODA, 2005.
 C. Budak, D. Agrawal, A. Abadi. Limiting the Spread of Misinformation in Social Networks. WWW, 2011.
 H. Tong, B. A. Prakash, C. Tsourakakis, T. EliassiRad, C. Faloutsos, D. H. Chau. On the Vulnerability of Large Graphs. ICDM, 2010.
10/07: Previous Lecture Contd.
10/09: Previous Lecture Contd.

10/14: Finding Sources in Epidemics [SLIDES]
Readings:
Optional Readings:

10/16: Times Series Mining [SLIDES]
Readings:

10/21: Viral Marketing and Outbreak Detection [SLIDES]
Readings:
 D. Kempe, J. Kleinberg, E. Tardos. Maximizing the Spread of Influence through a Social Network. SIGKDD 2003.
 Y. Singer. How to win friends and influence people, truthfully: Influence maximization mechanisms for social networks. WSDM, 2012.
 J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance. Costeffective Outbreak Detection in Networks. SIGKDD, 2007.
Optional Readings:
 M. Richardson, P. Domingos. Mining the Network Value of Customers. SIGKDD, 2001.
 J. Goldenberg, B. Libai, E. Muller. Talk of the network: A complex systems look at the underlying process of wordofmouth. Marketing Letters, 2001.
 M. Richardson, P. Domingos. Mining KnowledgeSharing Sites for Viral Marketing. SIGKDD, 2002.
 S. Hill, F. Provost, C. Volinsky. NetworkBased Marketing: Identifying Likely Adopters via Consumer Networks. Statistical Science, 2006.
 A. Ostfeld et al. The Battle of the Water Sensor Networks (BWSN): A Design Challenge for Engineers and Algorithms. Journal of Water Resources Planning and Management, 2009.
 J. Leskovec, L. Adamic and B. Huberman. The Dynamics of Viral Marketing. ACM TWEB, 2007.
 E. Mossel and S. Roch. On the Submodularity of Influence in Social Networks. STOC, 2007.
 S. Bharathi, D. Kempe, M. Salek. Competitive inﬂuence maximization in social networks. WINE, 2007.
 N. Agarwal, H. Liu, L. Tang, P. Yu. Identifying the Influential Bloggers in a Community. WSDM, 2008.
 W. Chen, Y. Wang, S. Yang. Efficient Influence Maximization in Social Networks. SIGKDD, 2009.
 M. Cha, H. Haddadi, F. Benevenuto, K. P. Gummadi. Measuring user influence in Twitter: The million follower fallacy. ICWSM, 2010.
 E. Bakshy, J. M. Hofman, W. A. Mason, D. J. Watts. Everyone’s an inﬂuencer: quantifying inﬂuence on twitter. WSDM, 2011.

10/23: Decision Models for Contagions [SLIDES]
Optional Readings:
 M. Granovetter. Threshold models of collective behavior. American Journal of Sociology 83(6):14201443, 1978.
 S. Morris. Contagion. Review of Economic Studies 67, 5778, 2000.
 S. Bikhchandani, D. Hirshleifer, I. Welch. A theory of fads, fashion, custom and cultural change as information cascades. Journal of Political Economy. Vol. 100, pp. 9921026, 1992.
 D. Centola. The Spread of Behavior in an Online Social Network Experiment. Science, 2010.
 M. Jackson, L. Yariv. Diffusion of Behavior and Equilibrium Properties in Network Games. American Economic Review , Vol 97, No. 2, 2007.
 P. Dodds and D. J. Watts. Universal Behavior in a Generalized Model of Contagion. Physical Review Letters, 2004.
 D. Centola, M. Macy. Complex Contagions and the Weakness of Long Ties. American Journal of Sociology, 2007
 E. Lieberman, C. Hauert, M. A. Nowak. Evolutionary Dynamics on Graphs. Nature 433: 312316, 2005
 S. Bhagat, A. Goyal, L. V. S. Lakshmanan. Maximizing Product Adoption in Social Networks. WSDM 2012.
 F. Chierichetti, J. Kleinberg, A. Panconesi. How to schedule a cascade in an arbitrary graph. EC, 2012.

10/28: Agent Based Models [SLIDES]
Readings:
 S. Eubank, H. Guclu, V. S. Kumar, M. V. Marathe, A. Srinivasan, Z. Toroczkai, N. Wang. Modelling disease outbreaks in realistic urban social networks. Nature, 2004.
 W. Broeck, C. Gioannini, B. Goncalves, M. Quaggiotto, V. Colizza, A. Vespignani. The GLEaMviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale. BMC Infectious Diseases, 2011.
Optional Readings:
 C. Barrett, K. Bisset, S. Eubank, X. Feng, M. Marathe. EpiSimdemics: An efficient and scalable framework for simulating the spread of infectious disease on large social networks. SC, 2008.
 S. Meloni, N. Perra, A. Arenas, S. Gomez, Y. Moreno, A. Vespignani. Modeling human mobility responses to the largescale spreading of infectious diseases. Nature Scientific Reports, 2011.
 S. Brown, J. Tai, R. Bailey, P. Cooley, W. Wheaton, M. Potter, R. Voorhees, M. LeJeune, J. Grefenstette, D. Burke, S. McGlone, B. Lee. Would school closure for the 2009 H1N1 influenza epidemic have been worth the cost?: A computational simulation of pennsylvania. BMC Public Health, 2011.
 D. Bakken. Visualize It: AgentBsed simulations may help you make better marketing decisions. Marketing Research, 2007.

10/30: Centrality Measures [SLIDES]
Optional Readings:
 J. Kleinberg. Authoritative sources in a hyperlinked environment. SODA, 1998.
 S. Brin and L. Page. The Anatomy of a LargeScale Hypertextual Web Search Engine. WWW, 1998.
 T. H. Haveliwala. TopicSensitive PageRank. WWW, 2002.
 Z. Gyongyi, P. Berkhin, H. GarciaMolina, J. Pedersen. Link Spam Detection Based on Mass Estimation. VLDB, 2006.
 U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 2001.
 L. Freeman. A Set of Measures of Centrality Based on Betweenness. Sociometry, 1977.
 L. Katz. A New Status Index Derived From Sociometric Analysis. Psychometrika, 1953.

11/04: Dynamics of Networks and Models [SLIDES]
Optional Readings:
 R. Kumar, J. Novak, A. Tomkins. Structure and evolution of online social networks. SIGKDD, 2006.
 J. Leskovec, J. Kleinberg, C. Faloutsos. Graph Evolution: Densification and Shrinking Diameters. ACM TKDD, 2007.
 J. Kleinberg. Bursty and nierarchical Structure in Streams. SIGKDD, 2002.
 R. Kumar, J. Novak, P. Raghavan, A. Tomkins. On the bursty evolution of Blogspace. WWW, 2003.
 D. Chakrabarti, Y. Zhan and C. Faloutsos. RMAT: A Recursive Model for Graph Mining. SDM, 2004.
 L. Akoglu and C. Faloutsos. RTG: A Recursive Realistic Graph Generator using Random Typing.ECML/PKDD, 2009.
 C. Seshadhri, A. Pinar and T. G. Kolda. An InDepth Analysis of Stochastic Kronecker Graphs. Journal of the ACM, 2013.

11/06: Link Prediction [SLIDES]
Optional Readings:
 D. LibenNowell, J. Kleinberg. The Link Prediction Problem for Social Networks. CIKM, 2003.
 B. Taskar, M.F. Wong, P. Abbeel, D. Koller. Link prediction in relational data. NIPS, 2006.
 M. GomezRodriguez, J. Leskovec, A. Krause. Inferring Networks of Diffusion and Influence. SIGKDD, 2010.
 L. Backstrom, J. Leskovec. Supervised Random Walks: Predicting and Recommending Links in Social Networks. WSDM, 2011.
 P. D’haeseleer, S. Liang, R. Somogyi. Genetic network inference: from coexpression clustering to reverse engineering. Bioinformatics, 2000.
 K. Bleakley, G. Biau, J. Vert. Supervised reconstruction of biological networks with local models. Bioinformatics, 2007.

11/11: Graph Clustering and Community Detection [SLIDES]
Readings:
Optional Readings:
 R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins. Trawling the web for emerging cybercommunities. WWW, 1999
 J. Shi and J. Malik.Normalized Cuts and Image Segmentation. IEEE PAMI, 2000.
 A. Ng, M. Jordan, Y. Weiss. On spectral clustering: Analysis and an algorithm. NIPS, 2001.
 G. Karypis and V. Kumar. Multilevel kway Partitioning Scheme for Irregular Graphs. J. Parallel Distrib. Computing, 1998.
 I. Dhillon, Y. Guan, and B, Kulis. A Fast Kernelbased Multilevel Algorithm for Graph Clustering. SIGKDD, 2005.
 M. E. J. Newman. Modularity and community structure in networks.PNAS, 2006.
 J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large WellDefined Clusters. Internet Mathematics, 2009.
 B. A. Prakash, A. Sridharan, M. Seshadri, S. Machiraju, and C. Faloutsos. EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs. PAKDD, 2010.
 J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Groundtruth. ICDM, 2012.
 D. Chakrabarti, S. Papadimitriou, D. Modha, and C. Faloutsos. Fully Automatic CrossAssociations. SIGKDD, 2004.

11/13: Anomaly Detection [SLIDES]
Readings:
Optional Readings:
 L. Akoglu and C. Faloutsos. OddBall: Spotting Anomalies in Weighted Graphs. ECML/PKDD, 2010.
 S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. LOCI: Fast outlier detection using the local correlation integral. ICDE, 2003.
 A. Ghoting, M. E. Otey, and S. Parthasarathy. LOADED: Linkbased outlier and anomaly detection in evolving data sets. ICDM, 2004.
 N. Katenka, Q. Ding, P. Barford, E. Kolaczyk, M. Crovella. Intrusion as (Anti)social Communication: Characterization and Detection. SIGKDD, 2012.
 V. Chandola, A. Banerjee, V. Kumar. Anomaly Detection: A Survey. ACM Computing Surveys, 2009.
 K. Henderson, T. EliassiRad, C. Faloutsos, L. Akoglu, L. Li, K. Maruhashi, B. A. Prakash, H. Tong. Metric forensics: a multilevel approach for mining volatile graphs. SIGKDD, 2010.

11/18: Time Series Forecasting [SLIDES]
Readings:
 T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, E. Keogh. Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping. SIGKDD 2012.
 BK. Yi, N. D. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, A. Biliris. Online Data Mining for CoEvolving Time Sequences. ICDE, 2000.
Optional Readings:

11/20: Meme Tracking [SLIDES]
Readings:
Optional Readings: