Research
- Collaborating with Gal Oren, Giorgis Georgakoudis, Harshitha Menon, Konstantinos Parasyris, and Niranjan Hasabnis in applying LLMs to automatically predict the runtime performance characteristics of CUDA and OpenMP GPU kernels from source code.
- Recent publication accepted to HPDC 2025 AI4Sys Workshop.
- Collaborated with Giorgis Georgakoudis and Konstantinos Parasyris to compare convergence times of several automatic optimization strategies for hyperparameter tuning of CPU-based OpenMP programs.
- Paper published at IPDPS 2024 iWAPT Workshop.
- Focused on tuning OpenMP codes online/offline at the end-to-end and region levels for both CPU and GPU.
- Integrated PAPI performance counters, expanding Apollo's autotuner metrics to cache-based, memory, and FLOP/INTOP metrics.
- Implemented Bayesian Optimization (BO) modeling support, providing fast and strong autotuning models beyond decision trees.
- Worked with John Kalamatianos, Varun Agrawal, and Marko Scrbak on optimizing AMD’s DLRM Embedding Bag (EBag) implementation in PyTorch.
- DLRM EBag is a sparse parallel table reduction operation written in OpenMP, where my strong initial analysis of the performance problems have led to a joint publication in IEEE Micro (see publications)
- Applied GRU and LSTM networks in PyTorch to predict Cholesky matrix factorization nonzero patterns for sparse matrices, achieving a 30.3x speedup over serial methods.
- Worked with Dr. Joshua Booth to create a proof-of-concept Cholesky matrix factorization-based hybrid solver combining direct and iterative methods for sparse linear systems.
Publications
Please email me if you'd like a copy of a paywalled publication ❤️Work Experience
- (see LLNL under Research section)
- (see AMD under Research section)
- GTA for Dr. Wu Feng’s CS4234 Parallel Computing course.
- Created homework assignments, graded weekly homework, held weekly office hours (5 hours/week), and maintained participation counts.
- Taught parallel programming with pThreads, OpenMP, CUDA, and OpenACC.
- Guided students in lab assignments to reinforce introductory Python programming concepts (control flow, data structures, algorithms).
- Collaborated with Mark Friedrich to create an SQL database, NodeJS server, and Bootstrap web interface for energy systems asset cataloging and vulnerability notification (NIST NVD).
Mentoring
- Instrumented desktop hardware (CPU, GPU, motherboard) with shunt resistors to record power profiles of AI/ML workloads.
- Fall 2024 – Present: Mentoring Nitya Ganta and Bruno Zegada for their CS undergraduate degrees; guiding them in profiling/modeling full-system power usage of modern AI workloads.
- Fall 2023 – Spring 2024: Mentored Cesar Smokowski (master's) on profiling and comparing AI/ML workload power consumption on AMD and NVIDIA GPUs.
- SeeMore is a 15ft tall cylindrical collection of 256 Raspberry Pi computers visualizing distributed MPI computation.
- Fall 2024: Worked with Mallika Pamula to design a 3D version of SeeMore in the Godot game engine, complete with animations so viewers could see the entire sculpture at-a-glance. Demoed and co-presented a small 30-node cluster with them at SC24.
- Fall 2023 – Spring 2024: Acted in the role of a Project Manager, guiding a team of undergraduate students to update and upgrade the SeeMore kinetic sculpture, adding and programming LED matrices to each node so as to visualize the compute done on each node. Trained Hayden Estes in the role of project manager.
- LACEY consisted of styluses attached to moving robotic arms that would interact with 30 tablets. LACEY demonstrated blockchain mining and distributed ledger concepts for general audiences.
- Fall 2021 – Summer 2022: Mentored Eles Jones (masters student) and Skylar Liang (undergrad) to design LACEY's control/visualization software.
- April 2022: Presented LACEY at the Smithsonian Museum of Natural History in Washington D.C. for the VT Accelerate Festival.
Posters
- iSeeMore: Design of a 256-Node RPi Cluster to Visualize LLM Computation Through Light and Movement for Mass Audiences
SC24 Student Research Competition, Nov 19, 2024 - Online Tuning of CUDA Kernels using Bayesian Optimization
IPDPS PhD Forum Poster Session, May 29, 2024 - Automagically Tuning the Execution of Parallel Programs
LLNL Summer Poster SLAM, Aug 03, 2021 - On Improving The Security of Industrial Control Systems at LBNL
F&M Fall Poster Session, Oct 25, 2019
LBNL Summer Poster Session, Aug 07, 2019
F&M Summer Research Fair, Apr 12, 2019
Presentations
- An Exploration of Global Optimization Strategies for Autotuning OpenMP-based Codes
iWAPT Workshop (IPDPS 2024), May 24, 2024 - LACE: A Robotic Sculpture to Visualize Blockchain Computing
VT Accelerate Festival @ Smithsonian National Museum of Natural History, Apr 8–11, 2022
Virginia Tech Science Festival, Nov 10, 2021 - An Introduction To Cholesky Matrix Factorization
F&M Hackman Summer Scholar Meetup, Jun 06, 2018 - Troubling Transoms! Exploring Multiple Solutions to the Same Problem
EPaDel Mathematics Conference, Apr 01, 2017 - Drawing Perspective Letters Using Geometry
EPaDel Mathematics Conference, Feb 02, 2017
Journal / Conference Reviews
- Transactions on Parallel and Distributed Systems (TPDS)
Manuscript Reviewer, Sept 2022 - Journal of Parallel and Distributed Computing (JPDC)
Manuscript Reviewer, Nov 2021
Scholarships & Awards
- 2025: VT GPSS Travel Fund Grant (for HPDC)
- 2025: VT CS Department Student Conference Travel Grant (for HPDC)
- 2024: VT CS Department Student Conference Travel Grant (for IPDPS, and SC24)
- 2024: IPDPS Student Conference Travel Grant
- 2022: Eleanor Davenport Leadership Fund Scholarship from VT
- 2020: New Horizons Graduate Scholarship to attend Virginia Tech (graduate studies)
- 2016: Gates Millennium and Posse Miami scholarships to attend Franklin & Marshall College
Volunteering
- Supercomputing Conference (SC)
- Student Volunteer, Nov 2024
- Student Volunteer, Nov 2023
- Student Volunteer, Nov 2022
- Student Volunteer, Nov 2021