Muhammad Ali Gulzar

Assistant Professor in Computer Science

I am an assistant professor in the Computer Science Department at Virginia Tech. I am also an Amazon Visiting Academic at Amazon Web Services. I received my Ph.D. in Computer Science at the University of California, Los Angeles where I was a Google Ph.D. Fellow 2017-20.

My research vision is to build systems that improve developer productivity through automated debugging and testing for applications in the emerging domains, including data-intensive software such as dataflow programs, ML/AI applications, and scientific analysis software such as computations notebooks. Under these broader goals, I redesign existing software productivity tools for emerging applications in three areas. I am interested in (1) automated tracking-code localization techniques in web applications, (2) re-engineering testing and debugging for data-intensive applications, and (3) advancing current testing and debugging practices in Federated Learning Applications.

My past work has focused on interactive and automated debugging for Apache Spark, symbolic execution based test generation for dataflow programs, and performance debugging in Apache Spark.

gulzar cs.vt.edu | Google Scholar | Github | LinkedIn

News

Our work on pathological non-executable notebooks is accepted to MSR 2025—congrats, Tien!
Our work on semantic cache for LLMs is accepted to IPDPS. Congrats, Waris!
My student,Haddi, co-authored the 2024 Web Almanac’s Privacy Chapter.
Our research on rare-path coverage and evidence-based tech hirring are accepted to SANER 2025.
Our work on using neuron provenance to identify responsible clients in FL is accepted to ICSE 2025. Congrats, Waris!
Our work on web ads decreasing the accessibility of web pages is accepted to ICSE 2025. Congrats, Haddi!
Our work on blocking JS tracking functions received the ACM CCS 2024 Distinguished Artifact Award. Congrats, Haddi!
I received the 2024-25 Amazon-VT Award for our work on Semantic Cache for LLMs.
Our project on transparency and accessibility issues in web ads engineering is funded by CCI.
Our work on auto-generating privacy-enchancing JS surrogates is accepted to CCS 2024!
Older news

Publications

2025

  1. [ICSE 2025] Accessibility Issues in Ad-Driven Web Applications
    Abdul Haddi Amjad, Muhammad Danish, Bless Jah, and Muhammad Ali Gulzar
    The 47th IEEE/ACM International Conference of Software Engineering. 2025
  2. [ICSE 2025] TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance
    Waris Gill, Ali Anwar, and Muhammad Ali Gulzar
    The 47th IEEE/ACM International Conference of Software Engineering. 2025
  3. [MSR 2025] Are the Majority of Public Computational Notebooks Pathologically Non-Executable?
    Tien Nguyen, Waris Gill, and Muhammad Ali Gulzar
    The 22nd IEEE/ACM International Conference on Mining Software Repositories. 2025
  4. [IPDPS 2025] MeanCache: User-Centric Semantic Caching for LLM Web Services
    Waris Gill, Mohamed Elidrisi, Pallavi Kalapatapu, Ammar Ahmed, Ali Anwar, and Muhammad Ali Gulzar
    The 39th IEEE International Parallel & Distributed Processing Symposium 2025
  5. [SANER 2025] A Metric for Measuring the Impact of Rare Paths on Program Coverage
    Leo St. Amour, Eli Tilevich, and Muhammad Ali Gulzar
    The IEEE International Conference on Software Analysis, Evolution and Reengineering. 2025
  6. [SANER 2025] Improving Evidence-Based Tech Hiring with GitHub-Supported Resume Matching
    Swanand Vaishampayan, Muhammad Ali Gulzar, and Chris Brown
    The IEEE International Conference on Software Analysis, Evolution and Reengineering. 2025

2024

  1. [CCS 2024] Blocking Tracking JavaScript at the Function Granularity ( Distinguished Artifiact Award)
    Abdul Haddi Amjad, Shaoor Munir, Zubair Shafiq, and Muhammad Ali Gulzar
    The 31st ACM Conference on Computer and Communications Security. 2024
  2. [FSE 2024] DeSQL: Interactive Debugging of SQL in DISC
    Sabaat Haroon, Chris Brown, and Muhammad Ali Gulzar
    The ACM International Conference on the Foundations of Software Engineering. 2024
  3. [FSE 2024] Natural Symbolic Execution-based Testing for Big Data Analytics
    Yaoxuan Wu, Ahmad Humayun, Muhammad Ali Gulzar, and Miryung Kim
    The ACM International Conference on the Foundations of Software Engineering. 2024
  4. [NAACL 2024] Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking
    Hong Jin Kang, Fabrice Harel-Canada, Muhammad Ali Gulzar, Nanyun Peng, and Miryung Kim
    2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics 2024
  5. [CCS 2024] How Do Visually Impaired Users Navigate Accessibility Challenges in an Ad-Driven Web
    Abdul Haddi Amjad, and Muhammad Ali Gulzar
    Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. Poster Track 2024

2023

  1. [ASE 2023] NaturalFuzz: Natural Input Generation for Big Data Analytics
    Ahmad Humayun, Yaoxuan Wu, Miryung Kim, and Muhammad Ali Gulzar
    The 38th IEEE/ACM International Conference on Automated Software Engineering. 2023
  2. [ESEC/FSE 2023] Co-Dependence Aware Fuzzing for Dataflow-based Big Data Analytics
    Ahmad Humayun, Miryung Kim, and Muhammad Ali Gulzar
    ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2023
  3. [ICSE 2023] FedDebug: Systematic Debugging for Federated Learning Applications
    Waris Gill, Ali Anwar, and Muhammad Ali Gulzar
    The ACM/IEEE 45th International Conference on Software Engineering 2023
  4. [PETS 2023] Blocking JavaScript without Breaking the Web:An Empirical Investigation
    Abdul Haddi Amjad, Zubair Shafiq, and Muhammad Ali Gulzar
    Proceedings on Privacy Enhancing Technologies Symposium 2023
  5. [SE4SafeML 2023] FedDefender: Backdoor Attack Defense in Federated Learning
    Waris Gill, Ali Anwar, and Muhammad Ali Gulzar
    Proceedings of the 1st International Workshop on Dependability and Trustworthiness of Safety-Critical Systems with Machine Learned Components 2023

2022

  1. [ASE 2022] Detecting Build Conflicts in Software Merge for Java Programs via Static Analysis
    Sheikh Towqir, Bowen Shen, Muhammad Ali Gulzar, and Na Meng
    The 37th IEEE/ACM International Conference on Automated Software Engineering 2022
  2. [TOSEM 2022] A Characterization Study of Merge Conflicts in Java Projects
    Bowen Shen, Muhammad Ali Gulzar, Fei He, and Na Meng
    2022
  3. [ACL 2022] Sibylvariant Transformations for Robust Text Classification
    Fabrice Harel-Canada, Muhammad Ali Gulzar, Nanyun Peng, and Miryung Kim
    In 60th Annual Meeting of the Association for Computational Linguistics 2022
    16 Pages.

2021

  1. [SOCC 2021] OptDebug: Fault-Inducing Operation Isolation for Dataflow Applications
    Muhammad Ali Gulzar, and Miryung Kim
    In The 12th ACM Symposium on Cloud Computing 2021
    13 Pages. 30% Acceptance Rate
  2. [IMC 2021] TrackerSift: Untangling Mixed Tracking and Functional Web Resources
    Abdul Hadi Amjad, Muhammad Saleem, Muhammad Ali Gulzar*, Zubair Shafiq*, and Fareed Zaffar*
    In Proceedings of the 2021 ACM Internet Measurement Conference 2021
    8 Pages. 27.9% Acceptance Rate
  3. [HiPS 2021] Towards a Serverless Bioinformatics Cyberinfrastructure Pipeline
    Shunyu David Yao, Muhammad Ali Gulzar, Liqing Zhang, and Ali R. Butt
    In Proceedings of the 1st Workshop on High Performance Serverless Computing 2021
    8 Pages. Workshop Paper.

2020

  1. [SOCC 2020] Influence-Based Provenance for Dataflow Applications with Taint Propagation
    Jason Teoh, Muhammad Ali Gulzar, and Miryung Kim
    In The 11th ACM Symposium on Cloud Computing 2020
    12 Pages. Full Paper. 24.4% Acceptance Rate
  2. [ASE 2020] BigFuzz: Efficient Fuzz Testing for Data Analytics using Framework Abstraction
    Qian Zhang, Jiyuan Wang, Muhammad Ali Gulzar, Rohan Padhye, and Miryung Kim
    In The 35th IEEE/ACM International Conference on Automated Software Engineering 2020
    12 Pages. Full Paper. 22.5% Acceptance Rate
  3. [ESEC/FSE 2020] Is Neuron Coverage a Meaningful Measure for Testing Deep Neural Networks?
    Fabrice Harel-Canada, Lingxiao Wang, Muhammad Ali Gulzar, Quanquan Gu, and Miryung Kim
    In The 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2020
    12 Pages. Full Paper. 28.0% Acceptance Rate
  4. [ICSE 2020] HeteroRefactor: Refactoring for Heterogeneous Computing with FPGA
    Jason Lau*, Aishwarya Sivaraman*, Qian Zhang*, Muhammad Ali Gulzar, Jason Cong, and Miryung Kim
    In 2020 IEEE/ACM 42nd International Conference on Software Engineering 2020
    13 Pages. Full Paper. 20.9% Acceptance Rate
  5. [ICSE Demo 2020] BigTest: Symbolic Execution Based Systematic Test Generation Tool for Apache Spark
    Muhammad Ali Gulzar, Madan Musuvathi, and Miryung Kim
    In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings 2020
    4 Pages. Demonstration Paper. 33.3% Acceptance Rate

2019

  1. [ESEC/FSE 2019] White-box Testing of Big Data Analytics with Complex User-defined Functions
    Muhammad Ali Gulzar, Shaghayegh Mardani, Madanlal Musuvathi, and Miryung Kim
    In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2019
    12 Pages. Full Paper. 24.4% Acceptance Rate
  2. [SoCC 2019] PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems
    Jason Teoh, Muhammad Ali Gulzar, Harry Xu, and Miryung Kim
    In Proceedings of the 2019 Symposium on Cloud Computing 2019
    12 Pages. Full Paper. 24.8% Acceptance Rate
  3. [ICSE SEIP 2019] Perception and Practices of Differential Testing
    Muhammad Ali Gulzar, Yongkang Zhu, and Xiaofeng Han
    In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice 2019
    10 Pages. Full Paper. 22.2% Acceptance Rate

2018

  1. [ICDCS 2018] LogLens: A Real-Time Log Analysis System
    Biplob Debnath, Mohiuddin Solaimani, Muhammad Ali Gulzar, Nipon Arora, Cristian Lumezanu, Jianwu Xu, Bo Zong, Hui Zhang, Guofei Jiang, and Latifur Khan
    In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) 2018
    11 Pages. Full Paper. 20.6% Acceptance Rate
  2. [VLDB Journal 2018] Adding Data Provenance Support to Apache Spark
    Matteo Interlandi, Ari Ekmekji, Kshitij Shah, Muhammad Ali Gulzar, Sai Deep Tetali, Miryung Kim, Todd Millstein, and Tyson Condie
    The VLDB Journal 2018
    21 Pages. VLDB Journal Paper.
  3. [ESEC/FSE Demo 2018] BigSift: Automated Debugging of Big Data Analytics in Data-intensive Scalable Computing
    Muhammad Ali Gulzar, Siman Wang, and Miryung Kim
    In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2018
    4 Pages. Demonstration Paper. 38.8% Acceptance Rate
  4. [ICSE ACM Student Research Competition 2018] Interactive and Automated Debugging for Big Data Analytics ( ACM Student Research Competition Gold Medal Winner)
    Muhammad Ali Gulzar,
    In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings 2018
    3 Pages. Short Paper.

2017

  1. [SoCC 2017] Automated Debugging in Data-intensive Scalable Computing
    Muhammad Ali Gulzar, Matteo Interlandi, Xueyuan Han, Mingda Li, Tyson Condie, and Miryung Kim
    In Proceedings of the 2017 Symposium on Cloud Computing 2017
    15 Pages. Full Paper. 23.6% Acceptance Rate
  2. [SIGMOD Demo 2017] Debugging Big Data Analytics in Spark with BigDebug
    Muhammad Ali Gulzar, Matteo Interlandi, Tyson Condie, and Miryung Kim
    In Proceedings of the 2017 ACM International Conference on Management of Data 2017
    4 Pages. Demonstration Paper. 34% Acceptance Rate

2016

  1. [ICSE 2016] BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark
    Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, Sai Tetali, Tyson Condie, Todd Millstein, and Miryung Kim
    In 2016 IEEE/ACM 38th International Conference on Software Engineering 2016
    12 Pages. Full Paper. 19.1% Acceptance Rate
  2. [SoCC 2016] Optimizing Interactive Development of Data-Intensive Applications
    Matteo Interlandi, Sai Deep Tetali, Muhammad Ali Gulzar, Joseph Noor, Tyson Condie, Miryung Kim, and Todd Millstein
    In Proceedings of the Seventh ACM Symposium on Cloud Computing 2016
    13 Pages. Full Paper. 25.1% Acceptance Rate
  3. [VLDB 2016] Titian: Data Provenance Support in Spark ( The "Best of VLDB" Paper)
    Matteo Interlandi, Kshitij Shah, Sai Deep Tetali, Muhammad Ali Gulzar, Seunghyun Yoo, Miryung Kim, Todd Millstein, and Tyson Condie
    Proc. VLDB Endow. 2016
    12 Pages. Full Paper. 21.2% Acceptance Rate
  4. [HotCloud 2016] Interactive Debugging for Big Data Analytics
    Muhammad Ali Gulzar, Xueyuan Han, Matteo Interlandi, Shaghayegh Mardani, Sai Deep Tetali, Todd Millstein, and Miryung Kim
    In 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 16) 2016
    7 Pages. Workshop Paper. 30.8% Acceptance Rate
  5. [ESEC/FSE Demo 2016] BigDebug: Interactive Debugger for Big Data Analytics in Apache Spark
    Muhammad Ali Gulzar, Matteo Interlandi, Tyson Condie, and Miryung Kim
    In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering 2016
    5 Pages. Demonstration Paper. 40.1% Acceptance Rate

2015

  1. [PACIS 2015] A Classification Based Framework to Predict Viral Threads
    Hashim Sharif, Saad Ismail, Shehroze Farooqi, Mohammad Taha Khan, Muhammad Ali Gulzar, Hasnain Lakhani, Fareed Zaffar, and Ahmed Abbasi
    In The Pacific Asia Conference on Information Systems (PACIS) 2015
    13 Pages. Full Paper.
* Student authors contributed equally.
* Senior authors are alphabatically arranged.

Funding

No news so far...