About

I build reliable language and learning systems that connect research innovation with real-world impact. My work spans large language models (LLMs), retrieval-augmented generation (RAG), and applied machine learning at scale, with recent projects on memory-augmented reasoning, long-context robustness, metadata-aware retrieval, synthetic data generation, and information-guided fine-tuning. I enjoy designing, evaluating, and deploying end-to-end ML and GenAI solutions in collaboration with cross-disciplinary teams across academia, media, and industry.

Education

Honors & Awards

Selected Publications

Author order as in papers. * indicates workshop or preprint where applicable.

Research & Industry Experience

The Washington Post — Data Science & ML Intern 2023

  • Built and tested LLM-based Gen-AI pipelines on AWS to improve newsroom efficiency; work led to a funded VT–Washington Post collaboration (“Ask the Post”).
  • Advised junior PhD students on follow-on efforts.

Virginia Tech — Graduate Research Assistant 2019–Present

  • Developed novel LLM architectures and evaluation frameworks to assess reasoning quality and factual grounding, including memory-augmented models for multi-document reasoning, investigations of long-context drift via graph induction, and metadata-aware retrieval for improved RAG grounding.
  • Proposed Fisher-information-guided regularization and tabular data synthesis for low-data regimes.
  • Developed and deployed Transformer-based NLP systems for automated research-entity extraction and narrative recommendation, leveraging encoder-decoder architectures, graph modeling, and prompt engineering to preserve expertise and contextual relevance.
  • Deployed ML/data-valuation models for Stable Isotope Ratio Analysis in the WFID platform for EU deforestation compliance; helped detect 260+ tons of illegal timber.
  • Constructed migration-flow forecasting pipelines for the Americas, used for live policy/security insights.

Teaching

Service & Outreach

Skills

Machine Learning & GenAI: Large Language Models (LLMs), Prompt Engineering, Fine-tuning (SFT, RLHF, LoRA/QLoRA), Agentic AI & LLM AgentsRAG (Retrieval-Augmented Generation), Embeddings & Vector Databases, Reward Modeling, LLM Evaluation & Quality Assessment, Synthetic Data Generation, Long-context Reasoning, Model Quantization, Supervised Learning, Feature Engineering & Model Explainability, Time-series Forecasting, Graph-based Modeling, Transfer Learning.
Frameworks & Tools: PyTorch, Hugging Face Transformers, PEFT, TRL, LangChain, OpenAI, scikit-learn, spaCy, NLTK, Pandas, NumPy, NetworkX, AWS (SageMaker, S3, EC2, Lambda), Docker, Git.
Programming: Python, Java, C++, MATLAB, R, SQL/NoSQL.