I am an assistant professor in the Computer Science Department at Virginia Tech. I received my Ph.D. in Computer Science at the University of California, Los Angeles where I was a Google Ph.D. Fellow 2017-20.
My research vision is to build systems that improve developer productivity through automated debugging and testing for applications in the emerging domains, including data-intensive software such as dataflow programs, ML/AI applications, and scientific analysis software such as computations notebooks. Under these broader goals, I redesign existing software productivity tools for emerging applications in three areas. I am interested in (1) automated tracking-code localization techniques in web applications, (2) re-engineering testing and debugging for data-intensive applications, and (3) advancing current testing and debugging practices in Federated Learning Applications.
|Our work on generating natural inputs during fuzzing is accepted to ASE 2023. Congrats Ahmad!|
|Our work on Co-Dependence Aware Fuzzing is accepted to ESEC/FSE 2023. Congrats Ahmad!|
|Our work on a novel debugging paradigm for Federated Learning systems has been accepted to ICSE 2023. Congrats Waris!|
|Our paper on statically detecting build conflicts in Java program is accpeted to ASE 2022. Congrats Sheikh!|
|Our empirical analysis of merge conflicts in Java projects is accepted to TOSEM 2022. Congrats Bowen!|
|I received the Rising Start Faculty Award from the Computer Science Department at Virginia Tech.|
|Our paper on text data augmentation is accepted to ACL 2022. Congratulations to Fabrice!|
|Our work on isolating fault-inducing operations in dataflow applications is accepted to SoCC 2021!|
|Our work on untangling tracking and functional resources in web apps is accepted to IMC 2021. Congrats Hadi!|
- [ASE 2023] NaturalFuzz: Natural Input Generation for Big Data AnalyticsThe 38th IEEE/ACM International Conference on Automated Software Engineering. 2023
- [ESEC/FSE 2023] Co-Dependence Aware Fuzzing for Dataflow-based Big Data AnalyticsACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2023
- [SE4SafeML 2023] FedDefender: Backdoor Attack Defense in Federated LearningProceedings of the 1st International Workshop on Dependability and Trustworthiness of Safety-Critical Systems with Machine Learned Components 2023
- [ASE 2022] Detecting Build Conflicts in Software Merge for Java Programs via Static AnalysisThe 37th IEEE/ACM International Conference on Automated Software Engineering 2022
- [TOSEM 2022] A Characterization Study of Merge Conflicts in Java Projects2022
- [ACL 2022] Sibylvariant Transformations for Robust Text ClassificationIn 60th Annual Meeting of the Association for Computational Linguistics 202216 Pages.
- [SOCC 2021] OptDebug: Fault-Inducing Operation Isolation for Dataflow ApplicationsIn The 12th ACM Symposium on Cloud Computing 202113 Pages. 30% Acceptance Rate
- [IMC 2021] TrackerSift: Untangling Mixed Tracking and Functional Web ResourcesIn Proceedings of the 2021 ACM Internet Measurement Conference 20218 Pages. 27.9% Acceptance Rate
- [HiPS 2021] Towards a Serverless Bioinformatics Cyberinfrastructure PipelineIn Proceedings of the 1st Workshop on High Performance Serverless Computing 20218 Pages. Workshop Paper.
- [SOCC 2020] Influence-Based Provenance for Dataflow Applications with Taint PropagationIn The 11th ACM Symposium on Cloud Computing 202012 Pages. Full Paper. 24.4% Acceptance Rate
- [ASE 2020] BigFuzz: Efficient Fuzz Testing for Data Analytics using Framework AbstractionIn The 35th IEEE/ACM International Conference on Automated Software Engineering 202012 Pages. Full Paper. 22.5% Acceptance Rate
- [ICSE Demo 2020] BigTest: Symbolic Execution Based Systematic Test Generation Tool for Apache SparkIn Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings 20204 Pages. Demonstration Paper. 33.3% Acceptance Rate
- [ESEC/FSE 2019] White-box Testing of Big Data Analytics with Complex User-defined FunctionsIn Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering 201912 Pages. Full Paper. 24.4% Acceptance Rate
- [SoCC 2019] PerfDebug: Performance Debugging of Computation Skew in Dataflow SystemsIn Proceedings of the 2019 Symposium on Cloud Computing 201912 Pages. Full Paper. 24.8% Acceptance Rate
- [ICSE SEIP 2019] Perception and Practices of Differential TestingIn Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice 201910 Pages. Full Paper. 22.2% Acceptance Rate
- [ICDCS 2018] LogLens: A Real-Time Log Analysis SystemIn 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) 201811 Pages. Full Paper. 20.6% Acceptance Rate
- [VLDB Journal 2018] Adding Data Provenance Support to Apache SparkThe VLDB Journal 201821 Pages. VLDB Journal Paper.
- [ESEC/FSE Demo 2018] BigSift: Automated Debugging of Big Data Analytics in Data-intensive Scalable ComputingIn Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering 20184 Pages. Demonstration Paper. 38.8% Acceptance Rate
- [ICSE ACM Student Research Competition 2018] Interactive and Automated Debugging for Big Data Analytics ( ACM Student Research Competition Gold Medal Winner)In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings 20183 Pages. Short Paper.
- [SoCC 2017] Automated Debugging in Data-intensive Scalable ComputingIn Proceedings of the 2017 Symposium on Cloud Computing 201715 Pages. Full Paper. 23.6% Acceptance Rate
- [SIGMOD Demo 2017] Debugging Big Data Analytics in Spark with BigDebugIn Proceedings of the 2017 ACM International Conference on Management of Data 20174 Pages. Demonstration Paper. 34% Acceptance Rate
- [ICSE 2016] BigDebug: Debugging Primitives for Interactive Big Data Processing in SparkIn 2016 IEEE/ACM 38th International Conference on Software Engineering 201612 Pages. Full Paper. 19.1% Acceptance Rate
- [SoCC 2016] Optimizing Interactive Development of Data-Intensive ApplicationsIn Proceedings of the Seventh ACM Symposium on Cloud Computing 201613 Pages. Full Paper. 25.1% Acceptance Rate
- [VLDB 2016] Titian: Data Provenance Support in Spark ( The "Best of VLDB" Paper)Proc. VLDB Endow. 201612 Pages. Full Paper. 21.2% Acceptance Rate
- [HotCloud 2016] Interactive Debugging for Big Data AnalyticsIn 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 16) 20167 Pages. Workshop Paper. 30.8% Acceptance Rate
- [ESEC/FSE Demo 2016] BigDebug: Interactive Debugger for Big Data Analytics in Apache SparkIn Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering 20165 Pages. Demonstration Paper. 40.1% Acceptance Rate
- [PACIS 2015] A Classification Based Framework to Predict Viral ThreadsIn The Pacific Asia Conference on Information Systems (PACIS) 201513 Pages. Full Paper.
Ministry of Education, Culture,Research, and Technology
SHF: Medium: Reinventing Fuzz Testing for Data and Compute Intensive Systems
2017 Google Ph.D. Fellowship
NSF I-Corps Grant for Technology Transfer