Chris Thomas at Virginia Tech

378 Data and Decision Sciences Building

727 Prices Fork Rd

Blacksburg, VA 24060

(540) 231-2993

I am an Assistant Professor in the Department of Computer Science at Virginia Tech. My research is at the intersection of computer vision, natural language processing, and multimedia. I am interested in many problems requiring reasoning across multimodal data, including cross-modal retrieval, information extraction, and knowledge representation. I am associated with the Sanghani Center for Artificial Intelligence and Data Analytics.

Prior to joining Virginia Tech, I was a postdoctoral researcher at Columbia University working with Professor Shih-Fu Chang.

Recent News

Feb 2026	Two papers accepted to CVPR 2026. One paper introduces a new approach for understanding and retrieving images with multiple meanings. The other develops new techniques to prevent misuse of models for harmful tasks.
Jan 2026	Received a NIFA funded AgriProspects grant for developing AI‑enhanced workforce training. Thanks to the AgriProspects team and the Extension Foundation for supporting this work.
Dec 2025	Excited to co-organize the first Generative AI for XR and Identity-based Applications (GenXR-ID) workshop at CVPR 2026. Our workshop brings together work at the intersection of multimodality, generative models, extended reality, and biometrics.
Nov 2025	Our paper on a new type of multi-image adversarial attack on multimodal large language models was accepted to AAAI 2026.
Aug 2025	Two papers accepted to EMNLP 2025 main. One paper introduces a new architecture for flexible-length discrete diffusion LLM infilling, and our other paper tackles question answering bias in VLMs. We also had two Findings papers accepted, which introduce new methods for steering and fine-grained classification in VLMs.
Jun 2025	We received a research grant from the Commonwealth Cyber Initiative for an exciting project related to protecting embodied agents against adversarial attacks. Thanks CCI!
May 2025	Received a Google Research Scholar award for a project on making multimodal web agents safer. Thanks Google!

Selected Publications

LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models

Alvi Md Ishmam, Najibul Haque Sarker, Zaber Ibn Abdul Hakim, and 1 more author

In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-26), 2026

Bib

@inproceedings{ishmam2026lamp,
  title = {LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models},
  author = {Ishmam, Alvi Md and Sarker, Najibul Haque and Hakim, Zaber Ibn Abdul and Thomas, Chris},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-26)},
  year = {2026},
  publisher = {AAAI Press},
}

Flexible-length Text Infilling for Discrete Diffusion Models

Andrew Zhang, Anushka Sivakumar, Chia-Wei Tang, and 1 more author

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Bib Website

@inproceedings{zhang2025flexible,
  title = {Flexible-length Text Infilling for Discrete Diffusion Models},
  author = {Zhang, Andrew and Sivakumar, Anushka and Tang, Chia-Wei and Thomas, Chris},
  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  year = {2025},
  address = {Suzhou, China},
  publisher = {Association for Computational Linguistics},
}

Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models

Md. Atabuzzaman, Ali Asgarov, and Chris Thomas

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Bib Website

@inproceedings{atabuzzaman2025mcqa,
  title = {Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models},
  author = {Atabuzzaman, Md. and Asgarov, Ali and Thomas, Chris},
  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  year = {2025},
  address = {Suzhou, China},
  publisher = {Association for Computational Linguistics},
}

Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval

Hani Alomari, Anushka Sivakumar, Andrew Zhang, and 1 more author

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025

Bib Website

@inproceedings{alomari2025maximal,
  title = {Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval},
  author = {Alomari, Hani and Sivakumar, Anushka and Zhang, Andrew and Thomas, Chris},
  booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)},
  year = {2025},
  address = {Toronto, Canada},
  publisher = {Association for Computational Linguistics},
}

M3D: MultiModal MultiDocument Fine-Grained Inconsistency Detection

Chia-Wei Tang, Ting-Chih Chen, Kiet Nguyen, and 3 more authors

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Bib Website

@inproceedings{tang2024m3d,
  title = {M3D: MultiModal MultiDocument Fine-Grained Inconsistency Detection},
  author = {Tang, Chia-Wei and Chen, Ting-Chih and Nguyen, Kiet and Mehrab, Kazi Sajeed and Ishmam, Alvi and Thomas, Chris},
  booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  pages = {22270--22293},
  year = {2024},
}

Journeybench: A challenging one-stop vision-language understanding benchmark of generated images

Zhecan Wang, Junzhang Liu, Chia-Wei Tang, and 8 more authors

Advances in Neural Information Processing Systems, 2024

Bib Poster Website

@article{wang2024journeybench,
  title = {Journeybench: A challenging one-stop vision-language understanding benchmark of generated images},
  author = {Wang, Zhecan and Liu, Junzhang and Tang, Chia-Wei and Alomari, Hani and Sivakumar, Anushka and Sun, Rui and Li, Wenhao and Atabuzzaman, Md and Ayyubi, Hammad and You, Haoxuan and others},
  journal = {Advances in Neural Information Processing Systems},
  volume = {37},
  pages = {63110--63123},
  year = {2024},
}

MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking

Ting-Chih Chen, Chia-Wei Tang, and Christopher Thomas

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Abs Bib Website

Fact-checking real-world claims often requires reviewing multiple multimodal documents in order to assess the claim’s truthfulness, a highly laborious and time-consuming task. In this paper, we present a summarization model crafted to generate claim-specific summaries useful for fact-checking from multimodal multi-document datasets. The model takes inputs in the form of documents, images, and a claim, with the objective of assisting in fact-checking tasks. We introduce a dynamic perceiver-based model that is able to handle inputs from multiple modalities of arbitrary lengths. To train our model, we leverage a novel reinforcement learning-based entailment objective in order to generate summaries that provide evidence distinguishing between different truthfulness labels. To assess the efficacy of our approach, we conduct experiments on both an existing benchmark as well as a new dataset of multi-document claims which we contribute. Our approach outperforms the SOTA approach by 4.6% in the claim verification task on the MOCHEG dataset and demonstrates strong performance on our new Multi-News-Fact-Checking dataset.
@inproceedings{chen-etal-2024-metasumperceiver, title = {{M}eta{S}um{P}erceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking}, author = {Chen, Ting-Chih and Tang, Chia-Wei and Thomas, Christopher}, editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek}, booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.acl-long.474}, pages = {8742--8757}, }

Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment

Alvi Md Ishmam, and Christopher Thomas

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Bib Website

@inproceedings{ishmam2024semantic,
  title = {Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment},
  author = {Ishmam, Alvi Md and Thomas, Christopher},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages = {24820--24830},
  year = {2024},
}

Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities

Hammad Ayyubi, Christopher Thomas, Lovish Chum, and 8 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence, 2024

Bib Website

@inproceedings{ayyubi2024beyond,
  title = {Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities},
  author = {Ayyubi, Hammad and Thomas, Christopher and Chum, Lovish and Lokesh, Rahul and Chen, Long and Niu, Yulei and Lin, Xudong and Feng, Xuande and Koo, Jaywon and Ray, Sounak and others},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  volume = {38},
  number = {16},
  pages = {17664--17672},
  year = {2024},
}

Fine-Grained Visual Entailment

Christopher Thomas, Yipeng Zhang, and Shih-Fu Chang

In Proceedings of the European Conference on Computer Vision, 2022

arXiv Bib HTML

@inproceedings{thomas2022fine,
  title = {Fine-Grained Visual Entailment},
  author = {Thomas, Christopher and Zhang, Yipeng and Chang, Shih-Fu},
  booktitle = {Proceedings of the European Conference on Computer Vision},
  pages = {398--416},
  year = {2022},
}

InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection

Yi Fung, Christopher Thomas, Revanth Reddy, and 6 more authors

In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021

Bib HTML

@inproceedings{fung2021infosurgeon,
  title = {InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection},
  author = {Fung, Yi and Thomas, Christopher and Reddy, Revanth and Polisetty, Sandeep and Ji, Heng and Chang, Shih-Fu and McKeown, Kathleen and Bansal, Mohit and Sil, Avi},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL)},
  year = {2021},
}

Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval

Christopher Thomas, and Adriana Kovashka

In Proceedings of the European Conference on Computer Vision (ECCV), 2020

Bib HTML PDF Website

@inproceedings{thomas2020preserving,
  title = {Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval},
  author = {Thomas, Christopher and Kovashka, Adriana},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year = {2020},
}

Predicting the politics of an image using webly supervised data

Christopher Thomas, and Adriana Kovashka

In Advances in Neural Information Processing Systems (NeurIPS 2019), 2019

Bib PDF Supp Poster Slides Website

@inproceedings{thomas2019predicting,
  title = {Predicting the politics of an image using webly supervised data},
  author = {Thomas, Christopher and Kovashka, Adriana},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS 2019)},
  year = {2019},
}