Eunice E. Santos
Director, Laboratory of
Computation, Information & Distributed Processing
Associate Professor,
Department of Computer Science
Genetics, Bioinformatics and Computational Biology
Program
Short
Biography Research
Interests Current
Research Projects Selected
Papers Funding
Current
Students Contact
Info
I
was born (in 1972) and raised in
Parallel & Distributed ProcessingComputational Social Science -- Social Networks AnalysisComputational Biology and BioinformaticsComputational PhysicsHigh Performance ComputingNetwork-Centric Operations/NCWIntelligent SystemsInformation RetrievalNetworking &SchedulingModeling and SimulationAlgorithms & ComplexityTools & Environments
IDA/DARPA
Defense Science Study Group
Robinson
Faculty Award
Spira Teaching Award
NSF
CAREER Grant
DoD-NDSEG Fellowship
Ford
Foundation Fellowship (declined)
Models for Analysis and Performance of Heterogeneous
Clusters
In this day and age of cluster computing, harnessing
computing power is conceptually easy. When processors are homogeneous and there
is one uniform network backbone, developing a general parallel model for
performance, analysis, and design that realistically guided and predicted
performance was difficult. There have been models which have shown great
success within the homogeneous framework such as the LogP model, of which I am a founder. The need for large-scale
parallelism spans fields throughout the sciences and engineering. Focusing on
how researchers in these disciplines use parallelism, it is startling to
realize how much they are re-inventing the design wheel? every
time they implement an algorithm or simulation. What was particularly
surprising was that even though the domains of the problems are very different
in nature (e.g. lattice-based protein folding, or spin system simulations),
many times the design techniques used to obtain efficient and effective code
were inherently similar. As such, I focused on developing a problem
classification and design framework that focused on the structure of the
problem independent of the application domain using graph theory, scheduling,
and operations research as a foundation.
With users now leveraging different clusters across a
variation of network backbones, understanding what it takes to design, and
deploy software is even more complex! Compounding the problem is the fact that
many times, you may not know what resources you will get and they will change
from one set of runs to the next, so the code or algorithm cannot be fixed to
one particular group of resources. The goal of this project is to research and
develop formal models for performance, and validate them on a variety of testbeds. In order to obtain a firm grasp of the criteria
and metrics needed for a general heterogeneous model, my group begins by
focusing on hierarchical clusters. Such clusters are particular relevant and
realistic since current technological trends support multiple processors on a
motherboard and potentially multiple processors on a chip. I have focused on
models for design and analysis of algorithms and performance and am focusing on
refining and validating the model on a variety of testbeds
throughout science, and engineering. In fact, many of our research projects
below intersect with this project, and provide validations for the models
developed in this project.
Supported by: NSF, ARO
Classifying Social Network Analysis Tools and
Methods
Understanding
and analyzing the information provided within a social network is an inexact science,
often containing large parts of intuition, skill and luck. This is due to the
inherently complicated nature of information and their relationships, as well
as their significance in different contexts. Clearly, it is very difficult to
determine how truly effective a particular analytical method will perform.
Moreover, what are the similarities from one technique to another? In fact,
considering the fact that new methods seem to be introduced fairly regularly,
how can one discern the significance and utility of these methods? Are they, in
essence, producing similar results as other known methods, and will therefore
have similar levels of effectiveness? To answer these and other important
questions, one of the fundamental goals of our work is the development and
refinement of the SNA:CEM (Social Network Analysis:
Classification, Evaluation, and Methodologies) Framework. Using the results and
knowledge gained from SNA:CEM, answers to questions
involving solution quality, performance, and scalability of social networks
analysis can be obtained. This can potentially put the stop on the introduction
of more and more tools that provide only a slight difference in solutions from
existing methodologies, and can also potentially pinpoint which types of tools
will provide the most significant results depending on the type of network.
Furthermore, SNA:CEM can be used to develop much more efficient tools through
both re-design of the underlying methodology as well as provide a design
structure for parallelization.
Supported
by: AFOSR, SAIC
Network-Centric Operations and Warfare
Advances in technology lead to a shift towards
Network-Centric operations (NCO) and Network-Centric Warfare (NCW) within the
military. While much work in NCO/NCW has focused on developing the components
based on information sharing and cross awareness, the lynchpin is a robust
network infrastructure. However, little is known about the overall
effectiveness and performance of NCO/NCW networks in general. Determining how
robust or stable an existing infrastructure (network) will be and to pinpoint
weaknesses or faultiness is an important and critical concern, especially since
these networks need to be able to be employed in an adaptive and dynamic
environment. As such, the goal of this project is the design of a theoretical
framework to assess and predict the effectiveness and performance of networks
and their loads for deployment in NCO/NCW. The framework will be imbued with
the ability to pinpoint bottlenecks and suggest corrections and modifications
leading to more effective and deployable networks.
Supported by: AFOSR
Culturally-Infused Social Networks
While
social network research deals with a conglomeration of issues, they can be
broken down into two major thrust areas: social network construction and social
network analyses. There is a natural feedback between these two since analysis
is dependent on the data used to construct the network yet can also help
pinpoint data which is relevant and potentially incomplete or completely
missing. The goal of this project is to research approaches that methodically
represent culture within social networks that is transportable from one
cultural
environment to another as well as cross-cultural. Transportability of culture will
lead to the ability to generalize existing analysis tools and push the field
forward. Furthermore, we will combine results with social networks analyses
methodologies that have been designed to deal with dynamic issues. Lastly, we
will use a framework classification of social network methodologies and
structures that will allow us to assess accurately and determine underlying
cultural assumptions tools have made within their analysis. Combining this
framework with our cultural approach will allow for scenario testing within specific
applications, in particular WMD/WME activities.
Supported
by: DTRA
A Large-Scale Information Processing Framework for
Intelligence Analysts
In the real world, the multitude of information that must
be sifted through in order to answer a query is one of the significant
bottlenecks in information processing. There are a variety of disparate foci
and approaches within the research community to address such issues. Coupled
with this is the need to develop a realistic model that will effectively and
efficiently be deployed on distributed platforms. Such platforms are typically
heterogeneous requiring the need to identify and integrate among a span of
performance metrics that must be factored into design and for effective
deployment. In order to deal with these important issues, we have introduced
and developed I-FGM (Intelligent
Foraging, Gathering and Matching) as a unifying architecture for dealing with
massive and dynamic information spaces within large-scale distributed
platforms. Given finite computational resources, I-FGM will proceed to explore
the information space and, over time, continuously identify and update
promising candidate information nuggets. My students and I are particularly
focused on several important elements of I-FGM including providing a formal
computational model of this massive data-intensive problem, determining metrics
and methodology for effective deployment, and performance evaluation and
prediction on large-scale distributed platforms.
Supported by: NGA
Protein Structure Prediction
How a protein folds is at the heart of understanding
biological phenomena and is one of the major computational problems in biology.
There are many different models for ab initio protein
folding spanning both lattice and off-lattice representations and uses energy
function optimization. We focus on the problem of simulating and determining
the native protein confirmation. The goal of our work is a blend of efficiency
and quality of solution. The large-scale space for potential native
confirmations has lead us to a focus on determining how to effectively utilize
local minima-based exploration of the landscape using a variety of
methodologies (such as MC, SA, etc) to maximize solution quality. We have been
able to show that our methodologies are providing results with significantly
improved solution quality over many existing methods.
I am particularly interested on determining how to
mitigate the computational load for the simulations, which is clearly a major
issue. One of the insights obtained from working on the different lattice and
off-lattice representations, was the fact that many of the energy functions are
decomposable in nature. In fact, the overlap of the energy functions is something
that can be effectively utilized to significantly enhance performance. One
technique we have been able to show is that caching partial results of energy
functions has obtained orders of magnitude in performance savings while either
maintaining or improving the quality of results. We have been able to validate this both theoretically and empirically. We've had great
success in serial processing, and are now focused on the problem on adapting
caching techniques onto parallel and distributed platforms in order to obtain
even better performance.
Supported by: ARO, SUN
Using Networking and Intelligent Systems for
Determining Effective Immunotherapy
Combating cancer is one of the major medical problems
of our time. Clearly, there has been a great deal of focus on developing and
refining conventional treatments (e.g. chemical, surgical, or radiation) for
increased effectiveness. Another area of focus is in biological therapies, in
particular immunotherapy, i.e. harnessing the body's immune system to combat cancer.
There are a number of different tracks to take in immunotherapy. One of them is
the development of a vaccine. Trying to find a vaccine is a very difficult
undertaking. A lot can be learned from simulating the behaviour
of a biological system in order to predict the effects of antigens and provide
potential paths of medical investigations.
A number of models in this field use conventional
approaches such as PDEs or ODEs.
However, such approaches express only the 'average' behaviour
and thus produce limited insight into the problem. Our approach is to view a
biological system as an intelligent system with each entity (e.g. cell) having
its own role and decision-making. This is a more accurate depiction and can
lead to much deeper understanding of behaviours such
as tumour evasion (tumour
growth that evades knowledge of the immune system). We derive much inspiration
for our models from classic computer science and engineering paradigms. For
example, cell-cell communication has many similarities to those of computer
networks (in particular ad hoc wireless networks). We have developed a
framework for representing key components of the immune system, tissue form,
and communication pathways. In order to obtain results that can be examined and
compared, one current focus of this project is on using our framework for
simulating breast cancer with a particular interest on the behaviour
of the immune system in the presence of specific antigens (e.g. the HER-family
of proteins).
Supported by: ARO
Undergraduate research support: CRA CREU program
Spin System Simulations in Distributed Environments
Spin systems lie at the heart of explaining and
modeling physical phenomena. The large-scale lattice structures utilized in
specialized spin systems such as Ising or Potts
models or those in quantum chromodynamics (QCD)
spotlight the need for parallelization in order to obtain timely simulation
results. A number of research issues arise when determining how to decompose
the lattice into sub-lattices for computations. Spins require energy results
from neighbor spins residing at most a certain distance away from their
location. Furthermore, properties such as detailed balance must be satisfied in
order to provide an effective parallel solution. Coupled with these issues is
the assignment of sub-lattice computations to processors and the communication
needed to transmit computations between subproblems.
This combination of structural data layout, communication, computing resources,
and maintenance of physical properties provide a multitude of research issues
forming a particularly complex problem. My students and I are focused on
generic parallel constructs for spin system simulations and their variations.
Supported by: NSF, DOE
Modeling Culture and Belief for Adversary Intent
Understanding the actions of an adversary in
real-world situations must take into account issues such as the culture and
belief of the adversary. This is a multi-disciplinary project spanning
psychology, computer science, mathematics, sociology, cognitive science, and
computational science. My students and I are particularly interested in
real-time resource management, and in the refinement of social networks.
Supported by: AFOSR
.
.
.
.
Santos, Eunice
E., Pan, Long, Arendt, Dustin, and Pittkin, Morgan,
An Effective Anytime Anywhere Approach for Centrality Measurements in social
Network Analysis, Proceedings of the IEEE
International Conference on System, Man, and Cybernetics, 2006.
.
Santos, Eugene,
Jr., Santos, Eunice E., Nguyen, Hien, Pan, Long and Korah, John, Large Scale Distributed Foraging, Gathering,
and Matching for Information Retrieval: Assisting the Geo-Spatial Intelligence
Analyst, Proceedings of the SPIE Defense
& Security Symposium, Orlando, FL, 2005.
.
Santos, Eunice
E., and Muthukrishnan, Gayathri,
Efficient Simulation Based on Sweep Selection for 2-D and 3-D Ising Spin Models on Hierarchical Clusters, Proceedings of the IEEE International
Parallel and Distributed Processing Symposium, 2004.
.
.
Deng, Yuefan, Glimm, James, Davenport,
James, Cai, X., and Santos, Eunice E., Performance
Models on QCDOC for Moleculer Dynamics with Coulomb
Potentials, International Journal of High
Performance Computing and Applications, vol. 18, no.2, p. 183-195, 2004.
.
Santos, Eunice
E., and Santos Jr., Eugene Reducing the Computational Load of Energy
Evaluations for Protein Folding, Proceedings
of the Fourth IEEE Symposium on Bioinformatics and Bioengineering, 2004.
.
Muthukrishnan, Gayathri, and Santos, Eunice E., On Simulating Hierarchical
Clusters for Performance of Ising Spin Systems, Proceedings of the International Conference
on Parallel and Distributed Processing Techniques and Applications, 2004.
.
Santos, Eunice
E. Optimal and Efficient Parallel Tridiagonal Solvers
using Direct Methods, Journal of
Supercomputing, vol. 30, no. 2, 7-115, 2004.
.
Santos, Eunice
E., and Chu, Pei-Yue, Efficient and Optimal Parallel
Algorithms for Cholesky Decomposition, Journal on Mathematical Modelling
and Algorithms, vol. 2, no. 3, p. 217-234, 2003.
.
Santos Jr.,
Eugene, Kim, Keum Joo, and
Santos, Eunice E, Local Minima-Based Exploration for Off-Lattice Protein
Folding, Proceedings of the IEEE
Computational Systems Bioinformatics Conference, 2003.
.
Santos, Eunice
E. Parallel Complexity of Matrix Multiplication, Journal of Supercomputing, vol. 25, no. 2, p. 155-176, 2003.
.
Santos, Eunice
E. Tridiagonal Solvers with Multiple Right Hand Sides
for k-dimensional Mesh and Torus Interconnection Networks, Parallel Processing Letters, vol. 13, no. 4, p. 659-672, 2003.
.
Santos, Eunice
E. Optimal and Efficient Algorithms for Summing and Prefix Summing on Parallel
Machines, Journal of Parallel and
Distributed Computing, vol. 64, no. 2, p. 517-543, 2002.
.
Santos, Eunice
E., Feng, Shuangtong, and
Rickman, Jeffrey M. ?Efficient Parallelization of 2-Dimensional Ising Spin Systems, Proceedings
of the IEEE International Parallel and Distributed Processing Symposium,
2002.
.
Santos, Eunice
E. A Classification Framework for Parallel & Distributed Algorithm Design, Proceedings of the International Conference
on Parallel and Distributed Processing Techniques and Applications, 2001.
.
Santos, Eunice
E. On Designing Optimal Parallel Triangular Solvers, Information and Computation, vol. 161, no. 2, p. 172-210, 2000.
.
Santos, Eunice
E. Optimal and Near-Optimal Algorithms for the k-item Broadcast Problem, Journal on Parallel and Distributed
Computing, vol. 57, no. 2, p. 121-139, 1999.
.
Santos, Eunice
E. On Lower Bounds on Running Time for General Numerical Computation Problems, Proceedings of the Ninth SIAM Conference on
Parallel Processing and Scientific Computing, 1999.
.
Santos, Eunice E. Optimal
Parallel Algorithms for Solving Tridiagonal Linear
Systems, Springer-Verlag Lecture Notes in Computer Science #1300, pages
700-709, 1997 (Proceedings of Euro-Par97).
.
.
.
Culler, David
E., Karp, Richard M., Patterson, David A., Sahay, Abhijit, Santos, Eunice E., Schauser,
Klaus E., Subramonian, Ramesh
and von Eiken, Thorsten LogP: A Practical Model of Parallel Computation, Communications of the Association for
Computing Machinery, vol. 37, no. 11, p. 78-85, 1996.
(detailed publication list)
I have been supported by a number of federal and
commercial organizations. Some of these agencies include:
.
Air Force Office
of Scientific Research
.
Army Research
Office
.
Defense Threat
Reduction Agency
.
Department of
Energy/Brookhaven National Laboratories
.
National
Geospatial Intelligence Agency
.
National Science
Foundation
.
Battelle
.
SAIC
.
AT&T
.
SUN Microsystems
Current funding:
.
Social
Networks Analysis: Classification, Evaluation and Methodologies, Air Force
Office of Scientific Research, $355k (sole-PI)
.
A Framework for
Adversarial Social Networks, Defense Threat Reduction Agency, $450k (PI)
.
Formulating a
Theoretical Framework for Assessing Network Loads for Effective Deployment in
Network-Centric Operations and Warfare, Air Force Office of Scientific
Research, $275k (sole-PI)
.
On the Effects
of Culture and Society on Adversarial Attitudes and Behaviour,
Air Force Office of Scientific Research, $690k
.
Hierarchical
Clusters for Computational Mathematics, Army Research Office, $202k (sole-PI)
.
On Effectively
Handling Large Volumes of Geospatial Intelligence Information: A Formal
Distributed Real-Time Processing Approach, National Geospatial-Intelligence
Agency (via subcontract through
.
Research and
Development Experimental Collaboration (Application of Advanced Technologies
for Early Warning and Decision Making for Threat and Vulnerability Assessment),
SAIC, $900k
.
Research
Instrumentation: Establishing a Laboratory for Research in Parallel Computing
and Signal Processing, National Science Foundation, $195k (with costsharing) (PI)
.
Realistic and
Efficient Modeling and Simulation on Parallel Platforms for Protein Folding,
SUN Microsystems, $107k (equipment) (sole-PI)
Graduate:
.
Dustin Arendt
.
Donghang Guo
.
Keum Joo
Kim
.
John Korah
.
Long Pan
.
Morgan Pittkin
.
Sreeram Ramalingam
.
Peter Scheffel
.
Huadong Xia
Undergraduate:
.
Jeff Alley
.
Sota Baba
.
Nick Brown
.
Landon Fraser
(CRA CREU)