Successfully defended my PhD dissertation, titled Improving LLM Reasoning and Retrieval for Structured and Complex Information Spaces, at Virginia Tech. Many thanks to my advisor Dr. Naren Ramakrishnan and the committee memberes, Dr. Chang-Tien Lu, Dr. Chris North, Dr. Xuan Wang, Dr. Sathappan Muthiah. Read my dissertation here.
Paul E. Torgersen Graduate Research Excellence Award finalist
Selected as a PhD finalist for Virginia Tech’s Paul E. Torgersen Graduate Student Research Excellence Award. See the awardees list for more information.
Metadata-aware RAG work accepted at ECIR 2026
Utilizing Metadata for Better Retrieval-Augmented Generation was accepted to the 48th European Conference on Information Retrieval. Read the paper here.
Product provenance verification paper accepted at AAAI 2026
Research on data valuation for product provenance verification was accepted to the AAAI Conference on Artificial Intelligence. Read the paper here.
Best Paper at IEEE BigData 2024
LLM Augmentations to Support Analytical Reasoning over Multiple Documents received the Best Paper Award.
Impact Highlights
70% average relative improvement over text-only retrieval baselines using metadata-aware dual-encoder methods for RAG.
86% schema-alignment reliability in a human-in-the-loop agentic data analysis system used across hundreds of newsroom sessions.
25% classification improvement and 50% generation improvement from memory-augmented LLM architectures for multi-document reasoning.
1–6% of advertised context capacity was enough to reveal memory-drift onset in a graph-based long-context LLM benchmark.
70% reduction in synthetic tabular data rule violations through permutation-aware generation and uncertainty-guided fine-tuning methods.
11% improvement over prior scientific information extraction baselines and 26% higher salient task/method extraction from scholarly documents.
59 wood products assessed, 260+ tons of allegedly illegal timber identified, and 9+ enforcement investigations supported through ML-assisted provenance workflows.
Research & Technical Expertise
LLMs & Generative AI:
Large Language Models, Prompt Engineering, Retrieval-Augmented Generation, Model Context Protocol (MCP), Long-context Reasoning, LLM Evaluation & Quality Assessment, Reward Modeling, Synthetic Data Generation, Quantization, Model Serving & Deployment.
Model Adaptation & Learning:
Fine-tuning, SFT, RLHF, PEFT, LoRA/QLoRA, Representation Learning, Recommender Systems, Supervised Learning, Feature Engineering, Explainability, Graph-based Modeling, Transfer Learning, Data Valuation, Uncertainty-aware Learning.
Systems & Applications:
Search and Retrieval, Embeddings & Vector Databases, Information Extraction, Structured Data Reasoning, Forecasting, Spatiotemporal Modeling, Data Engineering, Evaluation Pipelines, Human-in-the-loop Systems, Decision-support Workflows.
Frameworks & Tools:
PyTorch, Transformers, PEFT, TRL, LangChain, OpenAI, Claude, Gemini, Elasticsearch, FastAPI, AWS SageMaker, AWS EC2, GCP Compute Engine, Azure, scikit-learn, spaCy, NetworkX, Pandas, NumPy, Streamlit, Docker, Git.
Programming:
Python, Java, C++, MATLAB, R, SQL/NoSQL.
Selected Projects
Selected work highlighting system design, evaluation, deployment context, and measurable outcomes.
RAGMate — Metadata-Aware Retrieval for RAG
Designed metadata-aware dual-encoder retrieval methods that incorporate structured
disambiguation signals into embedding and ranking objectives.
Improved retrieval performance by 70% on average over text-only baselines in
retrieval-augmented generation settings.
Developed in collaboration with
Vectorize.io, with attention to practical
retrieval concerns including schema ambiguity, metadata use, source grounding,
ranking quality, and evaluation.
Speculatores: Memory-Augmented RAG for Multi-Document Reasoning
Built memory-augmented architectures for LLM-based reasoning across multiple documents,
using persistent context representations and cross-document evidence linking.
Combined retrieval, structured memory, and generation to support analytical reasoning over
long, distributed information spaces.
Achieved 25% relative improvement in classification performance and 50% improvement in
generation quality for multi-document analytical reasoning tasks.
MemoryDrift: Benchmarking Long-Context Reliability in LLMs
Developed a graph-based benchmark to evaluate whether LLMs can maintain stable structured
memory as context length increases.
Studied how models induce, update, and preserve graph-like representations over long contexts,
exposing reliability failures that are difficult to detect with standard long-context tests.
Revealed memory-drift onset at only 1–6% of advertised context capacity, showing that nominal
context length can substantially overstate reliable reasoning capacity.
DataWeave — Human–LLM System for Structured Data Analysis
Built an interactive human–LLM system for exploratory structured data analysis
and analytical reasoning over complex datasets.
Designed workflows for schema understanding, user-guided analysis, grounded
generation, and iterative refinement.
Enabled a human-in-the-loop agentic structured data analysis system with 86%
schema-alignment reliability across hundreds of newsroom sessions.
Developed in collaboration with
The Chronicle of Higher Education
as a deployed system and research platform for studying AI-assisted data analysis.
Newsroom LLM Systems — The Washington Post
Fine-tuned and evaluated LLMs on AWS EC2 and SageMaker for newsroom applications including subheadline generation, summarization, and question answering.
Contributed to the early development of the Ask the Post chatbot, advising on RAG design, grounding, and evaluation in collaboration with newsroom stakeholders.
Presented LLM evaluation, fine-tuning, and model development work to engineering leadership and newsroom audiences.
Focused on editorial quality, reliability, answer grounding, and model behavior in newsroom use cases.
Product Provenance Verification — ML for Supply-Chain Traceability
Led development of ML systems for product provenance verification, combining
probabilistic modeling, data valuation, and spatiotemporal reasoning for
compliance workflows.
Supported assessment of 59 wood products, identification of 260+ tons of
allegedly illegal timber, and 9+ enforcement investigations.
Developed in collaboration with
World Forest ID on real-world
traceability problems involving noisy data, uncertain labels, and regulatory
decision-support needs.
Contributed to research on optimizing product provenance verification using
data valuation methods.
Migration Forecasting — Policy-Relevant Predictive ML
Built large-scale forecasting pipelines for migration patterns and land border encounters using applied ML, statistical modeling, and spatiotemporal data.
Developed real-time and policy-facing predictive workflows designed for decision support under uncertainty.
Worked on projects funded by government and research partners, emphasizing scalable pipelines, evaluation, and decision-support relevance.
Scientific Information Extraction — Domain-Adapted Transformers
Designed a full-text scientific information extraction system using
domain-adapted transformer models and task-specific representation learning
objectives.
Achieved 11% improvement over prior baselines and 26% higher accuracy in
salient task and method extraction from scholarly documents.
Worked with collaborators at
CSET, Georgetown University,
to extract structured signals from scholarly documents for science-of-science
analysis.
Introduced permutation-aware tabular data generation methods to reduce invalid synthetic data generation by LLMs.
Reduced synthetic table rule violations by 70% through structured generation constraints and evaluation.
Contributed to Fisher information-guided regularization for language model fine-tuning, improving generalization in low-data regimes across 9/10 GLUE tasks with no added computational overhead.
Software, Systems & Open Source
DataWeave — Interactive human–LLM system for exploratory structured data analysis.
Speculatores — Memory-augmented RAG framework for multi-document reasoning.
RAGMate — Metadata-aware retrieval methods for retrieval-augmented generation.
MemoryDrift — Benchmark for analyzing memory drift in long-context LLM reasoning.
Publications
Peer-reviewed papers, manuscripts under review, and extended abstracts. See Google Scholar for citation details and updates.
LLMs, RAG, Structured Reasoning & Information Extraction
Utilizing Metadata for Better Retrieval-Augmented Generation
Raquib Bin Yousuf, Shengzhe Xu, Mandar Sharma, Andrew Neeser, Chris Latimer, Naren Ramakrishnan.
Proceedings of the 48th European Conference on Information Retrieval (ECIR 2026). Accepted.
LLM Augmentations to Support Analytical Reasoning over Multiple Documents
Raquib Bin Yousuf, Nicholas Defelice, Mandar Sharma, Shengzhe Xu, Naren Ramakrishnan.
Proceedings of the IEEE International Conference on Big Data, 2024. Best Paper
Can an LLM Induce a Graph? Investigating Memory Drift and Context Length
Raquib Bin Yousuf, Aadyant Khatri, Shengzhe Xu, Mandar Sharma, Naren Ramakrishnan.
Proceedings of the IEEE International Conference on Knowledge Graph (ICKG), 2025.
Information Guided Regularization for Fine-tuning Language Models
Mandar Sharma, Nithin Muralidhar, Shengzhe Xu, Raquib Bin Yousuf, Naren Ramakrishnan. Proceedings of the 1st Conference on Language Modeling (COLM), 2024.
Why LLMs Are Bad at Synthetic Table Generation (and what to do about it)
Shengzhe Xu, Cho-Ting Lee, Mandar Sharma, Raquib Bin Yousuf, Nikhil Muralidhar, Naren Ramakrishnan. Structured Knowledge for LLMs Workshop at ACM KDD, 2025.
DataWeave: Interactive Human–LLM Systems for Exploratory Structured Data Analysis
Raquib Bin Yousuf, et al. Under review, 2026.
Schema-Aware Harnesses for Tabular Reasoning with Language Models
Eunice Son, Raquib Bin Yousuf, et al. Under review, 2026.
What Should Search Retrieve Now?
Eunice Son, Raquib Bin Yousuf, et al. The 3rd Search Futures Workshop, ECIR 2026.
Lessons from Deep Learning Applied to Scholarly Information Extraction: What Works, What Doesn’t, and Future Directions
Raquib Bin Yousuf, Subhodip Biswas, Kulendra Kumar Kaushal, James Dunham, Rebecca Gelles, Sathappan Muthiah, Nathan Self, Patrick Butler, Naren Ramakrishnan. Data-driven Science of Science Workshop at ACM KDD, 2022.
Applied ML, Forecasting & Decision Support
Optimizing Product Provenance Verification using Data Valuation Methods
Raquib Bin Yousuf, Hoang Anh Just, Shengzhe Xu, Brian Mayer, Victor Deklerck, Jakub Truszkowski, John C. Simeone, Jade Saunders, Chang-Tien Lu, Ruoxi Jia, Naren Ramakrishnan. Proceedings of the AAAI Conference on Artificial Intelligence, 2026. Accepted.
Chasing the Timber Trail: Machine Learning to Reveal Harvest Location Misrepresentation
Shailik Sarkar, Raquib Bin Yousuf, Linhan Wang, Brian Mayer, Thomas Mortier, Victor Deklerck, Jakub Truszkowski, John C. Simeone, Marigold Norman, Jade Saunders, Chang-Tien Lu, Naren Ramakrishnan.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2025.
A Probabilistic Approach to Estimating Timber Harvest Location
Jakub Truszkowski, Roi Maor, Raquib Bin Yousuf, Subhodip Biswas, Caspar Chater, Peter Gasson, Scot McQueen, Marigold Norman, Jade Saunders, John Simeone, Naren Ramakrishnan, Alexandre Antonelli, Victor Deklerck. Ecological Applications, 35(1): e3077, 2025.
Forecasting Migration Patterns and Land Border Encounters
Raquib Bin Yousuf, Shengzhe Xu, Patrick Butler, Brian Mayer, Nathan Self, David Mares, Naren Ramakrishnan.
Proceedings of the IEEE International Conference on Big Data, 2024.
Mining Developer Questions about Major Web Frameworks
Zakaria Mehrab, Raquib Bin Yousuf, Ibrahim Asadullah Tahmid, Rifat Shahriyar. International Conference on Web Information Systems and Technologies (WEBIST), 2018.
Experience
Virginia Tech Transportation Institute — AI Researcher 2026–Present
Build data-grounding layers for heterogeneous traffic and mobility data by standardizing schemas,
metadata, and data-access patterns.
Develop LLM-based traffic analysis tools that answer natural-language questions through grounded
querying, analysis, and summaries.
Create benchmarks to evaluate and improve the reliability of AI tools over complex traffic datasets
and external context.
The Washington Post — Machine Learning Intern 2023
Fine-tuned and evaluated LLMs on AWS EC2 and SageMaker to explore newsroom applications
including subheadline generation, summarization, and question answering.
Contributed to early development of the Ask the Post chatbot, advising on RAG design and
evaluation in collaboration with newsroom stakeholders.
Presented LLM evaluation, fine-tuning, and model development for newsroom applications to
engineering leadership and newsroom audiences.
Virginia Tech, Sanghani Center for AI — Graduate Research Assistant 2019–2025
Built LLM, applied ML, NLP, retrieval, and forecasting systems for data-aware reasoning and
decision support across funded research projects. Selected work:
Developed memory-augmented architectures for LLM-based multi-document reasoning with
persistent context representation and cross-document evidence linking.
Proposed metadata-aware retrieval methods for RAG, incorporating structured disambiguation
signals into embedding and ranking objectives in collaboration with
Vectorize.io.
Built human-in-the-loop agentic data analysis workflows for real-world newsroom use cases in
collaboration with
The Chronicle of Higher Education.
Introduced permutation-aware tabular generation and Fisher information-guided regularization
for LLM fine-tuning to improve data efficiency and generalization.
Designed full-text scientific information extraction systems using domain-adapted transformer
models and task-specific representation learning.
Led development of ML systems for product provenance verification and large-scale migration
forecasting, deploying regulatory and real-time pipelines used in policy and compliance settings,
including provenance work in collaboration with
World Forest ID.
Contributed to funded research proposals for projects supported by DARPA, NSF, and external
partners, including technical section writing.
Eastern University Bangladesh — Lecturer 2018
Led advanced programming and digital logic design courses, managed labs, exams, student
supervision, and administrative responsibilities.
Served on academic and administrative committees and contributed to departmental teaching and
curriculum activities.
Talks & Presentations
LLM Augmentations to Support Analytical Reasoning over Multiple Documents — IEEE BigData 2024. Best Paper
Can an LLM Induce a Graph? Investigating Memory Drift and Context Length — IEEE ICKG 2025.
Lessons from Deep Learning Applied to Scholarly Information Extraction — Data-driven Science of Science Workshop at ACM KDD 2022.
AI-based Traffic Data Analysis Tool — Presented to City of Alexandria traffic authorities as part of a Virginia Tech Smart Mobility Lab user-story workshop, 2026.
LLM Evaluation, Fine-tuning, and Model Development for Newsroom Applications — Presented to engineering leadership and newsroom audiences at The Washington Post, 2023.
LLM Systems for Newsroom Applications — Internal technical presentation at The Washington Post.
Teaching & Communication
Lecturer, Eastern University Bangladesh — Advanced Programming and Digital Logic Design, 2018.
Graduate Teaching Assistant, Virginia Tech — Object-Oriented Programming, Software Design & Data Structures, and Social Media Analytics, 2019–2022.
Developed and delivered instructional materials on large language models and generative AI for undergraduate coursework at Virginia Tech.
Created instructional and presentation materials to support early-stage integration of generative AI into departmental curricula.
Presented research and AI-focused materials in meetings with collaborators and external stakeholders.
Awards & Service
Best Paper Award — IEEE International Conference on Big Data, 2024.
Paul E. Torgersen Graduate Student Research Excellence Award, PhD Finalist — Virginia Tech, 2026.
Dean’s List — Bangladesh University of Engineering and Technology, 2015–2017.
Reviewer — Conference on Language Modeling (COLM); IEEE Transactions on Big Data.
Mentored undergraduate and junior graduate researchers on ML and LLM projects, contributing to peer-reviewed publications.
Departmental service — Served on academic and administrative committees at Eastern University Bangladesh.
Education
Ph.D. in Computer Science — Virginia Tech 2026
Advisor: Naren Ramakrishnan
Dissertation: Improving LLM Reasoning and Retrieval for Structured and Complex Information Spaces
M.S. in Computer Science — Virginia Tech 2022
B.S. in Computer Science — Bangladesh University of Engineering and Technology (BUET) 2017
Bio
I am an AI/ML researcher and applied ML engineer working on language models, retrieval, representation learning, information extraction, forecasting, recommender systems, and data-centric learning. My work centers on building reliable systems that reason over structured and unstructured data, with attention to grounding, evaluation, adaptation, and practical deployment.
I recently completed my PhD in Computer Science at Virginia Tech, where I developed data-aware AI systems for reasoning and decision-making over complex real-world data. I build systems that operate over multi-source, structured, unstructured, and noisy information, with an emphasis on grounding, uncertainty, evaluation, and practical deployment.
I have worked on AI and ML systems for generative AI, search, newsroom applications, supply-chain traceability, agricultural product verification, migration forecasting, scientific information extraction, and structured-data analysis. Across these projects, I focus on system design, robust evaluation, scalable pipelines, and translating research ideas into useful tools for analysts, researchers, policy teams, and domain experts.