Eunice E. Santos

Director, Laboratory of Computation, Information & Distributed Processing

Associate Professor, Department of Computer Science

Genetics, Bioinformatics and Computational Biology Program

 

Short Biography Research Interests Current Research Projects Selected Papers Funding Current Students Contact Info

Short Biography

 

I was born (in 1972) and raised in Ohio, and then moved to California in 1990 to attend graduate school. I have B.S. and M.S. degrees in both Mathematics and Computer Science. In 1995, I received a Ph.D. in Computer Science from UC Berkeley. From 1995-2000, I was the Director of the Parallel and Distributed Processing Laboratory, and a faculty member in the Department of Electrical Engineering and Computer Science at Lehigh University. In 2000, I moved to Virginia Polytechnic Institute & State University (Virginia Tech or VT for short). I founded and am the Director of the Laboratory for Computation, Information & Distributed Processing (LCID), and am an Associate Professor in the Department of Computer Science at VT and in the Genetics, Bioinformatics and Computational Biology (GBCB) Program. I have been awarded an NSF CAREER grant, the Spira Award for Excellence in Teaching and the Robinson Faculty Award. I have served on a number of panels for research, grants, and diversity. I am currently a panel member for fellowships for the American Association of University Women (AAUW). I have also been an invited or distinguished speaker at a number of workshops and conferences. Also, I have served on numerous program committees and editorial boards, and am on the editorial board of Scientific Programming, and currently serving as the subject area editor for the Journal of Supercomputing in the area of Algorithms and Modeling. I have been a consultant/visiting scientist at Brookhaven National Laboratories, and I am currently a member of the IDA/DARPA Defense Science Study Group (DSSG).

Research Interests

 
Parallel & Distributed Processing
Computational Social Science -- Social Networks Analysis
Computational Biology and Bioinformatics
Computational Physics
High Performance Computing
Network-Centric Operations/NCW
Intelligent Systems
Information Retrieval
Networking &Scheduling
Modeling and Simulation
Algorithms & Complexity
Tools & Environments
 

Selected Awards

               IDA/DARPA Defense Science Study Group

               Robinson Faculty Award

               Spira Teaching Award

               NSF CAREER Grant

               DoD-NDSEG Fellowship

               Ford Foundation Fellowship (declined)

Current Research Projects

 

Models for Analysis and Performance of Heterogeneous Clusters

 

In this day and age of cluster computing, harnessing computing power is conceptually easy. When processors are homogeneous and there is one uniform network backbone, developing a general parallel model for performance, analysis, and design that realistically guided and predicted performance was difficult. There have been models which have shown great success within the homogeneous framework such as the LogP model, of which I am a founder. The need for large-scale parallelism spans fields throughout the sciences and engineering. Focusing on how researchers in these disciplines use parallelism, it is startling to realize how much they are re-inventing the design wheel? every time they implement an algorithm or simulation. What was particularly surprising was that even though the domains of the problems are very different in nature (e.g. lattice-based protein folding, or spin system simulations), many times the design techniques used to obtain efficient and effective code were inherently similar. As such, I focused on developing a problem classification and design framework that focused on the structure of the problem independent of the application domain using graph theory, scheduling, and operations research as a foundation.

 

With users now leveraging different clusters across a variation of network backbones, understanding what it takes to design, and deploy software is even more complex! Compounding the problem is the fact that many times, you may not know what resources you will get and they will change from one set of runs to the next, so the code or algorithm cannot be fixed to one particular group of resources. The goal of this project is to research and develop formal models for performance, and validate them on a variety of testbeds. In order to obtain a firm grasp of the criteria and metrics needed for a general heterogeneous model, my group begins by focusing on hierarchical clusters. Such clusters are particular relevant and realistic since current technological trends support multiple processors on a motherboard and potentially multiple processors on a chip. I have focused on models for design and analysis of algorithms and performance and am focusing on refining and validating the model on a variety of testbeds throughout science, and engineering. In fact, many of our research projects below intersect with this project, and provide validations for the models developed in this project.

 

Supported by: NSF, ARO

 

Classifying Social Network Analysis Tools and Methods

 

Understanding and analyzing the information provided within a social network is an inexact science, often containing large parts of intuition, skill and luck. This is due to the inherently complicated nature of information and their relationships, as well as their significance in different contexts. Clearly, it is very difficult to determine how truly effective a particular analytical method will perform. Moreover, what are the similarities from one technique to another? In fact, considering the fact that new methods seem to be introduced fairly regularly, how can one discern the significance and utility of these methods? Are they, in essence, producing similar results as other known methods, and will therefore have similar levels of effectiveness? To answer these and other important questions, one of the fundamental goals of our work is the development and refinement of the SNA:CEM (Social Network Analysis: Classification, Evaluation, and Methodologies) Framework. Using the results and knowledge gained from SNA:CEM, answers to questions involving solution quality, performance, and scalability of social networks analysis can be obtained. This can potentially put the stop on the introduction of more and more tools that provide only a slight difference in solutions from existing methodologies, and can also potentially pinpoint which types of tools will provide the most significant results depending on the type of network. Furthermore, SNA:CEM can be used to develop much more efficient tools through both re-design of the underlying methodology as well as provide a design structure for parallelization.

 

Supported by: AFOSR, SAIC

 

Network-Centric Operations and Warfare

 

Advances in technology lead to a shift towards Network-Centric operations (NCO) and Network-Centric Warfare (NCW) within the military. While much work in NCO/NCW has focused on developing the components based on information sharing and cross awareness, the lynchpin is a robust network infrastructure. However, little is known about the overall effectiveness and performance of NCO/NCW networks in general. Determining how robust or stable an existing infrastructure (network) will be and to pinpoint weaknesses or faultiness is an important and critical concern, especially since these networks need to be able to be employed in an adaptive and dynamic environment. As such, the goal of this project is the design of a theoretical framework to assess and predict the effectiveness and performance of networks and their loads for deployment in NCO/NCW. The framework will be imbued with the ability to pinpoint bottlenecks and suggest corrections and modifications leading to more effective and deployable networks.

 

Supported by: AFOSR

 

Culturally-Infused Social Networks

 

While social network research deals with a conglomeration of issues, they can be broken down into two major thrust areas: social network construction and social network analyses. There is a natural feedback between these two since analysis is dependent on the data used to construct the network yet can also help pinpoint data which is relevant and potentially incomplete or completely missing. The goal of this project is to research approaches that methodically represent culture within social networks that is transportable from one cultural

environment to another as well as cross-cultural. Transportability of culture will lead to the ability to generalize existing analysis tools and push the field forward. Furthermore, we will combine results with social networks analyses methodologies that have been designed to deal with dynamic issues. Lastly, we will use a framework classification of social network methodologies and structures that will allow us to assess accurately and determine underlying cultural assumptions tools have made within their analysis. Combining this framework with our cultural approach will allow for scenario testing within specific applications, in particular WMD/WME activities.

 

Supported by: DTRA

 

A Large-Scale Information Processing Framework for Intelligence Analysts

 

In the real world, the multitude of information that must be sifted through in order to answer a query is one of the significant bottlenecks in information processing. There are a variety of disparate foci and approaches within the research community to address such issues. Coupled with this is the need to develop a realistic model that will effectively and efficiently be deployed on distributed platforms. Such platforms are typically heterogeneous requiring the need to identify and integrate among a span of performance metrics that must be factored into design and for effective deployment. In order to deal with these important issues, we have introduced and developed I-FGM (Intelligent Foraging, Gathering and Matching) as a unifying architecture for dealing with massive and dynamic information spaces within large-scale distributed platforms. Given finite computational resources, I-FGM will proceed to explore the information space and, over time, continuously identify and update promising candidate information nuggets. My students and I are particularly focused on several important elements of I-FGM including providing a formal computational model of this massive data-intensive problem, determining metrics and methodology for effective deployment, and performance evaluation and prediction on large-scale distributed platforms.

 

Supported by: NGA

 

Protein Structure Prediction

 

How a protein folds is at the heart of understanding biological phenomena and is one of the major computational problems in biology. There are many different models for ab initio protein folding spanning both lattice and off-lattice representations and uses energy function optimization. We focus on the problem of simulating and determining the native protein confirmation. The goal of our work is a blend of efficiency and quality of solution. The large-scale space for potential native confirmations has lead us to a focus on determining how to effectively utilize local minima-based exploration of the landscape using a variety of methodologies (such as MC, SA, etc) to maximize solution quality. We have been able to show that our methodologies are providing results with significantly improved solution quality over many existing methods.

 

I am particularly interested on determining how to mitigate the computational load for the simulations, which is clearly a major issue. One of the insights obtained from working on the different lattice and off-lattice representations, was the fact that many of the energy functions are decomposable in nature. In fact, the overlap of the energy functions is something that can be effectively utilized to significantly enhance performance. One technique we have been able to show is that caching partial results of energy functions has obtained orders of magnitude in performance savings while either maintaining or improving the quality of results. We have been able to validate this both theoretically and empirically. We've had great success in serial processing, and are now focused on the problem on adapting caching techniques onto parallel and distributed platforms in order to obtain even better performance.

 

Supported by: ARO, SUN

 

Using Networking and Intelligent Systems for Determining Effective Immunotherapy

 

Combating cancer is one of the major medical problems of our time. Clearly, there has been a great deal of focus on developing and refining conventional treatments (e.g. chemical, surgical, or radiation) for increased effectiveness. Another area of focus is in biological therapies, in particular immunotherapy, i.e. harnessing the body's immune system to combat cancer. There are a number of different tracks to take in immunotherapy. One of them is the development of a vaccine. Trying to find a vaccine is a very difficult undertaking. A lot can be learned from simulating the behaviour of a biological system in order to predict the effects of antigens and provide potential paths of medical investigations.

 

A number of models in this field use conventional approaches such as PDEs or ODEs. However, such approaches express only the 'average' behaviour and thus produce limited insight into the problem. Our approach is to view a biological system as an intelligent system with each entity (e.g. cell) having its own role and decision-making. This is a more accurate depiction and can lead to much deeper understanding of behaviours such as tumour evasion (tumour growth that evades knowledge of the immune system). We derive much inspiration for our models from classic computer science and engineering paradigms. For example, cell-cell communication has many similarities to those of computer networks (in particular ad hoc wireless networks). We have developed a framework for representing key components of the immune system, tissue form, and communication pathways. In order to obtain results that can be examined and compared, one current focus of this project is on using our framework for simulating breast cancer with a particular interest on the behaviour of the immune system in the presence of specific antigens (e.g. the HER-family of proteins).

 

Supported by: ARO

Undergraduate research support: CRA CREU program

 

Spin System Simulations in Distributed Environments

 

Spin systems lie at the heart of explaining and modeling physical phenomena. The large-scale lattice structures utilized in specialized spin systems such as Ising or Potts models or those in quantum chromodynamics (QCD) spotlight the need for parallelization in order to obtain timely simulation results. A number of research issues arise when determining how to decompose the lattice into sub-lattices for computations. Spins require energy results from neighbor spins residing at most a certain distance away from their location. Furthermore, properties such as detailed balance must be satisfied in order to provide an effective parallel solution. Coupled with these issues is the assignment of sub-lattice computations to processors and the communication needed to transmit computations between subproblems. This combination of structural data layout, communication, computing resources, and maintenance of physical properties provide a multitude of research issues forming a particularly complex problem. My students and I are focused on generic parallel constructs for spin system simulations and their variations.

 

Supported by: NSF, DOE

 

Modeling Culture and Belief for Adversary Intent

 

Understanding the actions of an adversary in real-world situations must take into account issues such as the culture and belief of the adversary. This is a multi-disciplinary project spanning psychology, computer science, mathematics, sociology, cognitive science, and computational science. My students and I are particularly interested in real-time resource management, and in the refinement of social networks.

 

Supported by: AFOSR

 

 

Selected Papers

 

.           Santos, Eunice E., Rickman, Jeffrey M., Muthukrishnan, Gayathri, and Feng, Shuangtong, Efficient Algorithms for the Parallelizing of Monte Carlo Simulations for 2-D Ising Spin Models, Journal of Supercomputing, in revision.

 

.           Santos, Eunice E., and Santos, Eugene Jr., Effective Computational Reuse for Energy Evaluations in Protein Folding, International Journal of Artificial Intelligence and Tools, vol. 15, no. 5, p. 725-740, 2006.

 

.           Santos, Eugene Jr., Santos, Eunice E., and Kim Keum Joo, Satisfying Constraint Sets through Convex Envelope Approximations, Journal of Experimental and Theoretical Artifical Intelligence, vol. 15, no. 3, p. 413-432, 2006.

 

.           Santos, Eunice E., Pan, Long, Arendt, Dustin, and Pittkin, Morgan, An Effective Anytime Anywhere Approach for Centrality Measurements in social Network Analysis, Proceedings of the IEEE International Conference on System, Man, and Cybernetics, 2006.

 

.           Santos, Eugene, Jr., Santos, Eunice E., Nguyen, Hien, Pan, Long and Korah, John, Large Scale Distributed Foraging, Gathering, and Matching for Information Retrieval: Assisting the Geo-Spatial Intelligence Analyst, Proceedings of the SPIE Defense & Security Symposium, Orlando, FL, 2005.

 

.           Santos, Eunice E., and Muthukrishnan, Gayathri, Efficient Simulation Based on Sweep Selection for 2-D and 3-D Ising Spin Models on Hierarchical Clusters, Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2004.

 

.           Santos, Eunice E., Guo, Donghang, Santos Jr., Eugene, and Onesty, William, A Multi-Agent System Environment for Modelling Cell and Tissue Biology, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2004.

 

.           Deng, Yuefan, Glimm, James, Davenport, James, Cai, X., and Santos, Eunice E., Performance Models on QCDOC for Moleculer Dynamics with Coulomb Potentials, International Journal of High Performance Computing and Applications, vol. 18, no.2, p. 183-195, 2004.

 

.           Santos, Eunice E., and Santos Jr., Eugene Reducing the Computational Load of Energy Evaluations for Protein Folding, Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering, 2004.

 

.           Muthukrishnan, Gayathri, and Santos, Eunice E., On Simulating Hierarchical Clusters for Performance of Ising Spin Systems, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2004.

 

.           Santos, Eunice E. Optimal and Efficient Parallel Tridiagonal Solvers using Direct Methods, Journal of Supercomputing, vol. 30, no. 2, 7-115, 2004.

 

.           Santos, Eunice E., and Chu, Pei-Yue, Efficient and Optimal Parallel Algorithms for Cholesky Decomposition, Journal on Mathematical Modelling and Algorithms, vol. 2, no. 3, p. 217-234, 2003.

 

.           Santos Jr., Eugene, Kim, Keum Joo, and Santos, Eunice E, Local Minima-Based Exploration for Off-Lattice Protein Folding, Proceedings of the IEEE Computational Systems Bioinformatics Conference, 2003.

 

.           Santos, Eunice E. Parallel Complexity of Matrix Multiplication, Journal of Supercomputing, vol. 25, no. 2, p. 155-176, 2003.

 

.           Santos, Eunice E. Tridiagonal Solvers with Multiple Right Hand Sides for k-dimensional Mesh and Torus Interconnection Networks, Parallel Processing Letters, vol. 13, no. 4, p. 659-672, 2003.

 

.           Santos, Eunice E. Optimal and Efficient Algorithms for Summing and Prefix Summing on Parallel Machines, Journal of Parallel and Distributed Computing, vol. 64, no. 2, p. 517-543, 2002.

 

.           Santos, Eunice E., Feng, Shuangtong, and Rickman, Jeffrey M. ?Efficient Parallelization of 2-Dimensional Ising Spin Systems, Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2002.

 

.           Santos, Eunice E. A Classification Framework for Parallel & Distributed Algorithm Design, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2001.

 

.           Santos, Eunice E. On Designing Optimal Parallel Triangular Solvers, Information and Computation, vol. 161, no. 2, p. 172-210, 2000.

 

.           Santos, Eunice E. Optimal and Near-Optimal Algorithms for the k-item Broadcast Problem, Journal on Parallel and Distributed Computing, vol. 57, no. 2, p. 121-139, 1999.

 

.           Santos, Eunice E. On Lower Bounds on Running Time for General Numerical Computation Problems, Proceedings of the Ninth SIAM Conference on Parallel Processing and Scientific Computing, 1999.

 

.           Santos, Eunice E. Optimal Parallel Algorithms for Solving Tridiagonal Linear Systems, Springer-Verlag Lecture Notes in Computer Science #1300, pages 700-709, 1997 (Proceedings of Euro-Par97).

 

.           Santos, Eunice E. Optimal Parallel Algorithms for Matrix Multiplication, Proceedings of the Eighth SIAM Conference on Parallel Processing and Scientific Computing, 1997.

 

.           Santos, Eunice E. Optimal and Efficient Parallel Algorithms for Summing and Prefix Summing, Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, 1996.

 

.           Culler, David E., Karp, Richard M., Patterson, David A., Sahay, Abhijit, Santos, Eunice E., Schauser, Klaus E., Subramonian, Ramesh and von Eiken, Thorsten LogP: A Practical Model of Parallel Computation, Communications of the Association for Computing Machinery, vol. 37, no. 11, p. 78-85, 1996.

 

(detailed publication list)

Funding

 

I have been supported by a number of federal and commercial organizations. Some of these agencies include:

 

.           Air Force Office of Scientific Research

.           Army Research Office

.           Defense Threat Reduction Agency

.           Department of Energy/Brookhaven National Laboratories

.           National Geospatial Intelligence Agency

.           National Science Foundation

.           Battelle

.           SAIC

.           AT&T

.           SUN Microsystems

 

Current funding:

 

.           Social Networks Analysis: Classification, Evaluation and Methodologies, Air Force Office of Scientific Research, $355k (sole-PI)  

.           A Framework for Adversarial Social Networks, Defense Threat Reduction Agency, $450k (PI)

.           Formulating a Theoretical Framework for Assessing Network Loads for Effective Deployment in Network-Centric Operations and Warfare, Air Force Office of Scientific Research, $275k (sole-PI)

.           On the Effects of Culture and Society on Adversarial Attitudes and Behaviour, Air Force Office of Scientific Research, $690k

.           Hierarchical Clusters for Computational Mathematics, Army Research Office, $202k (sole-PI)

.           On Effectively Handling Large Volumes of Geospatial Intelligence Information: A Formal Distributed Real-Time Processing Approach, National Geospatial-Intelligence Agency (via subcontract through Dartmouth College), $322k (sole-PI)

.           Research and Development Experimental Collaboration (Application of Advanced Technologies for Early Warning and Decision Making for Threat and Vulnerability Assessment), SAIC, $900k

.           Research Instrumentation: Establishing a Laboratory for Research in Parallel Computing and Signal Processing, National Science Foundation, $195k (with costsharing) (PI)

.           Realistic and Efficient Modeling and Simulation on Parallel Platforms for Protein Folding, SUN Microsystems, $107k (equipment) (sole-PI)

 

 

Current Students:

 

Graduate:

 

.           Dustin Arendt

.           Donghang Guo

.           Keum Joo Kim

.           John Korah

.           Long Pan

.           Morgan Pittkin

.           Sreeram Ramalingam

.           Peter Scheffel

.           Huadong Xia

 

Undergraduate:

 

.           Jeff Alley

.           Sota Baba

.           Nick Brown

.           Landon Fraser (CRA CREU)