May 2021: I successfully defended my Ph.D.!
May 2020: Internship work on Machine Translation covered by Slator.
Mar 2020: Gave a talk on 'Detecting Population-Level Societal Events from News Articles' at MASC-SLL, University of Maryland, College Park.
Feb 2020: Check out my DAC student spotlight!
Jan 2020: Selected as a recipient of Twitch (Amazon) Research Fellowship 2020!
Nov 2019: Paper about adapting generic Black-Box MT systems to specific domains accepted to AAAI 2020!
May 2019: I'll be spending another summer at Netflix working on Neural Machine Translation.
Jan 2019: 1 paper accepted to The Web Conference 2019!
Jan 2019: Best paper award at IUI 2019!
May 2018: I'll be spending the summer at Netflix working on Neural Machine Translation hosted by Ritwik Kumar, Boris Chen and Vinith Misra.
Aug 2017: Our submission for automatic narrative generation from a large collection of documents selected as a finalist of ODNI Xpress Challenge (13 out of 387 submissions from 42 countries were finalists )!
April 2017: I'll be joining Discovery Analytics Centre as a Ph.D. student in summer advised by Dr. Naren Ramakrishnan!
April 2017: The undergraduate students that I supervised and worked with on the PhotoSleuth project won 1st place at VTURCS Spring Symposium.
Nov 2016: Presented our paper and poster on understanding human performance in image geolocation at the GroupSight workshop at HCOMP 2016!


I am a Ph.D. candidate in the Computer Science department at Virginia Tech's College of Engineering, advised by Prof. Naren Ramakrishnan.
My research falls under the umbrella of Natural Language Processing, Deep Learning and Data Mining. General topics in ML that interest me are information extraction, domain adaptation, low-resource ML, transfer learning and knowledge graphs.

Currently, I am affiliated with Discovery Analytics Centre (DAC). Before joining DAC, I have worked at the Crowd Intelligence Lab at Virginia Tech where I was involved in building hybrid human-AI systems combining crowdsourcing and face recognition.

I was Twitch Research Fellow for 2020 and worked on entity extraction with Saad Ali. I've spent two wonderful summers('18 & '19) at Netflix where we concieved a novel machine learning solution for improving deep neural machine translation models.

In the past I've worked at Dreamworks Animation and GlobalLogic.


  • Reviewer: ACL 2021, Rep4NLP @ ACL 2021, NILLI workshop EMNLP 2021, NAACL-HLT 2021, ACM Transactions on Knowledge Discovery from Data (TKDD) Journal, Rep4NLP @ ACL 2020, ACL 2020, KDD 2019.

  • Program Committee: NAACL-HLT 2021, Rep4NLP at ACL 2020

  • Student Volunteer: CVPR 2016

  • Selected Publications

    Sneha Mehta, Huzefa Rangwala, Naren Ramakrishnan
    Coming soon...
    Simplify-then-Translate: Automatic Preprocessing for Black-Box Translation
    Sneha Mehta, Bahareh Azarnoush, Boris Chen, Avneesh Saluja, Vinith Misra, Ballav Bihani, Ritwik Kumar
    In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI '20)
    Event Detection Using Hierarchical Multi-Aspect Attention
    Sneha Mehta, M. Raihanul, Huzefa Rangwala, Naren Ramakrishnan
    In Proceedings of the 30th Web Conference 2019 (WWW '19)
    Low Rank Factorization for Compact Multi-Head Self-Attention
    Sneha Mehta, Huzefa Rangwala, Naren Ramakrishnan
    Arxiv Preprint
    PhotoSleuth: Combining Human Expertise and Face Recognition to Identify Historical Portraits
    Vikram Mohanty, David Thames, Sneha Mehta, Kurt Luther
    In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI'19)
    An Exploratory Study of Human Performance in Image Geolocation
    Sneha Mehta, Chris North, Kurt Luther
    Workshop on Human Computation for Image and Video Analysis @ The Fourth AAAI Conference on Human Computation and Crowdsourcing (HCOMP'16)


    Mehta,S., Kumar,R., Bihani,B., Chen,B., Misra,V., Semeniakin,M., Saluja,A., Bonaci,V. Automatic Preprocessing for Black Box Translation. (pending)


    Research Fellow, Twitch

    Researching methods for multi-modal information extraction from e-sports streams for enhancing search.

    Feb - Dec 2020

    Machine Learning Research Intern, Netflix

    Researched & trained deep neural network models to simplify colloquial language to be more conducive to machine translation of multiple low resource languages. Assisted in developing predicitive models for estimating MT confidence scores.

    Summer 2018 & 2019

    R&D Engineer/Technical Director, Dreamworks Animation

    Developer on a scrum-based agile team that developed and maintained a task graph based distributed execution framework providing interface for designing and executing complex workflows.
    Supported Layout Department on the feature film Penguins of Madagascar (2014).

    Jan 2014 - Jul 2015

    Software Development Intern, GlobalLogic

    Developed an Android application to remotely view and interact with a desktop screen using a mobile device. Different protocols for different gesture types were developed to interact with the Desktop over the network.
    Developed a learning based OCR application for recognizing computer fonts from images. Developed a novel approach for data augmentation.

    Summer 2011 & 2012

    Selected Projects

    Siamese Network for Binary Visual Question Answering
    We study the performance of a siamese network based deep neural architecture on the task of binary(Yes/No) visual question answering. Comparing the performance of a siamese network based VQA model to a non-siamese VQA model we find that having a pairwise loss helps perform better than a loss from a non-siamese VQA network.
    Sneha Mehta, Yash Goyal (Project advisor: Prof. Devi Parikh).
    Extracting Topics from Tweets and Webpages
    We develop the topic analysis component of a robust Information Retrieval system for search and retrieval of large-scale tweets and webpages built on top of Solr, a general purpose open-source search engine. Our contribution enables semantic search and retrieval of tweets and webpages based on topics.
    Sneha Mehta, Radha Krishnan Vinayagam (Project advisor: Prof. Edward Fox).
    Birds in a Forest
    We employ a random forest approach for bird species identification. First we train 25 SVMs to predict 25 bird features for our dataset images. Then we train a random-forest to identify the specific bird species. We used the CUB-200-2011 bird dataset for this task.
    Sneha Mehta, Aditya Pratapa, Phillip Summers (Project advisor: Prof. Bert Huang).
    Targeted Summary Generation
    Given a question, our narrative generation pipeline generates a one page natural language response to that question from the given dataset of articles and the associated metadata. (This was a submission to the ODNI Xpress challenge where it was a finalist.)
    Sneha Mehta, Rupen Paul Khandpur
    TD Transfer
    The aim of this tool is to simplify the process of data transfer between different studio locations of the world. Some of the features include directory view indicating synced and out-of-sync files, scheduling transfers, transfer status, weather, analytics etc. This is built on top of Python Twisted (Asynchronous Event Framework) and xmlrpc protocol. (This was work done during an internship at Dreamworks Animation).


    Automatic Question Generation
    Given a sentence automatically generate reading comprehension style factual questions from that sentence, such that the sentence contains answers to those questions.


    Invited to give a talk at the VT Alumni Event, Arlington Virginia, Nov 2019 (Story).

    Copyright © Sneha Mehta (Me) 2015. All rights reserved.