Elaheh Raisi

E-mail: elaheh[at]vt[dot]edu

Welcome to my website! My name is Elaheh Raisi and I completed my Ph.D. in the computer science department at Virginia Tech. I was working under Prof. Bert Huang's supervision in the Machine Learning Laboratory.

My research interests lie in the broad areas of machine learning, data mining, and social network analysis.
Here are my CV, LinkedIn profile , my Google scholar citation, and BibTeX Listing

Here is my research statement.

Elaheh Raisi
Research Area:
My advisor and I work on cyberbullying identification on social media. Detrimental online behavior such as harassment and bullying is becoming a serious problem as people communicate over the Internet more than ever before. We created an automated data-driven algorithm for detecting cyberbullying incidents using weak supervision. By starting with a small, user-supplied seed dictionary of high-precision bullying indicators, we simultaneously discover instigators, victims, and vocabulary of words indicates bullying. We observed interesting results on three social media datasets: Twitter, Instagram, and Ask.fm. We are working on creating a more formal probabilistic model for bullying to robustly incorporate noise and uncertainty.
We are also working on learning in probabilistic graphical models where the available data are incomplete. Including latent variables in graphical models causes computational costs incurred by the unknown values of latent variables. Many methods introduced for learning models with latent variables are expensive because they need repeated inference, which is not an easy process since it requires many iterations to converge. We are trying to speed up learning on discrete Markov random field by applying the dualization of belief propagation inference. We examine different inference methods (Belief Propagator, TRW BP, Convex BP, Max-Margin BP, MPLP). We test this method to image segmentation task on horse and scene understanding datasets.
Honors and Awards:
  • Tapia travel award from Virginia Tech, Fall 2018.
  • Best graduate student poster presentation award at the Virginia Tech SAIC Integrated Security Colloquium, 2018.
  • Received stipend bonus from Computer Science department at Virginia Tech for excellent progress, 2017.
  • Best paper award at Learning with Limited Labeled data workshop at NIPS for the paper titled “Co- trained Ensemble Models for Weakly Supervised Cyberbullying Detection”, 2017.
  • Best paper award for the paper titled "Cyberbullying Detection with Weakly Supervised Machine Learning", ASONAM 2017.
  • Women in Machine Learning (WiML) workshop Travel Award, 2017.
  • Grace Hopper Celebration Student Scholarship, 2017. Sponsored by RetailMeNot.
  • Best poster presentation at the ISVT cultural showcase and poster symposium, 2016.
  • CRA-Women Grad Cohort Workshop Scholarship, 2016.
  • Runner up for the best project prize, Machine Learning course, 2015.
  • Broadening Participation in Data Mining Travel Scholarship, 2014.
  • Third top Artificial Intelligence student of department in Masters degree graduate students of the year 2008.
  • Ranked in the top 1% of the Bachelor's degree participants in the National Universities Entrance Exam in summer 2001.
  • Ranked the top student during three years of high school educations in Mathematics and Physics.
  • Reviewer for Advances in Social Networks Analysis and Mining (ASONAM) 2019
  • Reviewer for Student Research Workshop (SRW) at ACL 2019
  • Reviewer for ICML 2019
  • Reviewer for CyberSafety 2019
  • Reviewer for CyberSafety 2018
  • Reviewer for data science track of Grace Hopper Celebration of Women in Computing 2018
  • Reviewer for Women in Machine Learning 2017
Work Experience:
  • Data Science intern at PayPal , summer 2018
  • Machine Learning intern at Cadence Design Systems , summer 2017
  • Web developer, Analyzer, and Database Designer SADAD Informatics Corp., a company related to the largest commercial retail bank in Iran and in the Middle East, 2011-2014.
  • Web developer, Analyzer, and Database Designer, PadiD Pardaz Engineering Corp., 2009-€“2011.
  • Web and Windows developer, Atisun Engineering Corp., 2007-2009.
Teaching Assistantship:
Ph.D. Dissertation:
Weakly Supervised Machine Learning for Cyberbullying Detection: In this research, we develop automated, data-driven methods for harassment-based cyberbullying detection. The availability of tools such as these can enable technologies that reduce the harm and toxicity created by these detrimental behaviors. Our general framework is based on consistency of two detectors that co-train one another. One learner identifies bullying incidents by examining the language content in the message; another learner considers social structure to discover bullying. When designing the general framework, we address three tasks: First, we use machine learning with weak supervision, which significantly alleviates the need for human experts to perform tedious data annotation. Second, we incorporate the efficacy of distributed representations of words and nodes such as deep, nonlinear models in the framework to improve the predictive power of models. Finally, we decrease the sensitivity of the framework to language describing particular social groups including race, gender, religion, and sexual orientation. Here is my dissertation. Elaheh Raisi
Master's Thesis:
Incremental Nonparametric Weighted Feature Extraction for online Subspace Pattern Classification: In this study, a new online method based on nonparametric weighted feature extraction (NWFE) is proposed. NWFE was introduced to enjoy optimum characteristics of linear discriminant analysis (LDA) and nonparametric discriminant analysis (NDA) while rectifying their drawbacks. It emphasizes the points near decision boundary by putting greater weights on them and deemphasizes other points. Incremental nonparametric weighted feature extraction (INWFE) is the online version of NWFE. INWFE has advantages of NWFE method such as extracting more than L-1 features in contrast to LDA. It is independent of the class distribution and performs well in complex distributed data. The effects of outliers are reduced due to the nature of its nonparametric scatter matrix. Furthermore, it is possible to add new samples asynchronously, i.e. whenever a new sample becomes available at any given time, it can be added to the algorithm. This is useful for many real world applications since all data cannot be available in advance. This method is implemented on synthetic data, a number of UCI datasets and Indian Pine dataset . Results are compared with NWFE in terms of classification accuracy and execution time. For nearest neighbour classifier it is shown that this technique converges to NWFE at the end of learning process. In addition, the computational complexity is reduced in comparison with NWFE in terms of execution time. Here is the link. Elaheh Raisi
Course Projects:
Efficient Training of MRFs with Latent Variable using Paired-Dual Learning (Spring 2016): In this project, we propose a framework that quickly trains Markov random fields with latent variables by avoiding repeated inferences. We used a variational learning objective that substitutes belief propagation dual problems for two corresponding inference problems, augmented with Bethe entropy. We evaluate gradients using incomplete dual inference optimization to avoid repeated, full inference. We demonstrate the effectiveness of the proposed method in the task of image segmentation on Scene Understanding dataset, showing that regarding training time, our approach is superior to EM and subgradient, converging faster to the optimal solution. Here is the report. Elaheh Raisi
Digit Recognition (Spring 2015): In this project, we did optical character (specifically digit) recognition using various machine learning techniques. We worked on two datasets, one of them is standard MNIST dataset which is benchmark dataset and used in many papers for evaluation. The other one is The Street View House Numbers (SVHN) Dataset which is a real-world image dataset obtained from house numbers. We study and apply various feature extraction and classification methods on these two datasets and evaluate the results and compare them to the results published in respective papers. The feature extraction techniques that we explored were Sparse coding and Principal component analysis (PCA). The classification methods that we used were Nearest-Neighbor classifier, Quadratic classifier, Support Vector Machines (SVM), Multi Layered Neural Networks, and Convolutional Neural Network (CNN). Elaheh Raisi
Purchase Prediction Using Online Trading Transactions (Fall 2014): In this project we predict the buyers needs and their trading behavior in order to design an efficient recommendation system. We focus on Allstate Insurance company dataset in Kaggle . Each day people receive many advertisements via online shopping websites. Each of the advertised items has a certain cost, brand, rating, etc.. If the eventual purchase can be predicted sooner, then the seller is less likely to lose its customers. In an insurance company case if the eventual purchase can be predicted sooner in the shopping window, the quoting process is shortened and the issuer is less likely to lose the customer business. The inputs of our system are: User/buyer information (i.e age, gender, marital status, etc.), Product/Seller information (i.e. Cost, ranking, brand, category of the items provided, etc.), and Quote history of User/buyer. The output is a set of products which are most likely to be purchased by User/buyer. Elaheh Raisi
Probabilistic Optimal Pathing in Dynamic, Time-Sensitive Routing Networks (Fall 2014): Link prediction is fundamental in forecasting how some graph might change over time. We propose various classification approaches that attempt to model optimality of paths in graphs over time. This project includes large-scale data cleaning and preprocessing on adjacency lists for various internet traffic graphs following by complex analysis for feature extraction. Once a solid set of features was isolated, we performed rigorous classification techniques and modifications to standard logistic regression on a set of paths. These paths were at one point in time optimal on some of the graphs. Our goal was then to produce a distribution over optimality at some future time step. Our results indicate strong predictive ability in logistic regression with various feature approaches. We were able to beat our standard baseline of essentially choosing always predict absent and always predict present for each of the paths. We use the Kaggle competition hosted by Facebook for internet traffic data to get the dataset. Elaheh Raisi