Elaheh Raisi

E-mail: elaheh[at]vt[dot]edu

Welcome to my website! My name is Elaheh Raisi and I am currently a Ph.D. student in the computer science department at Virginia Tech. I am working under Prof. Bert Huang's supervision in the Machine Learning Laboratory.

My research interests lie in the broad areas of machine learning, data mining, and social network analysis.
Here are my CV, LinkedIn profile , and my Google scholar citation

Research Area:
My advisor and I work on cyberbullying identification on social media. Detrimental online behavior such as harassment and bullying is becoming a serious problem as people communicate over the Internet more than ever before. We created an automated data-driven algorithm for detecting cyberbullying incidents using weak supervision. By starting with a small, user-supplied seed dictionary of high-precision bullying indicators, we simultaneously discover instigators, victims, and vocabulary of words indicates bullying. We observed interesting results on three social media datasets: Twitter, Instagram, and Ask.fm. We are working on creating a more formal probabilistic model for bullying to robustly incorporate noise and uncertainty.
We are also working on learning in probabilistic graphical models where the available data are incomplete. Including latent variables in graphical models causes computational costs incurred by the unknown values of latent variables. Many methods introduced for learning models with latent variables are expensive because they need repeated inference, which is not an easy process since it requires many iterations to converge. We are trying to speed up learning on discrete Markov random field by applying the dualization of belief propagation inference. We examine different inference methods (Belief Propagator, TRW BP, Convex BP, Max-Margin BP, MPLP). We test this method to image segmentation task on horse and scene understanding datasets.
Honors and Awards:
  • Grace Hopper Celebration Student Scholarship, 2017.
  • Best poster presentation at the ISVT cultural showcase and poster symposium, 2016.
  • CRA-Women Grad Cohort Workshop Scholarship, 2016.
  • Runner up for the best project prize, Machine Learning course, 2015.
  • Broadening Participation in Data Mining Travel Scholarship, 2014.
  • Third top Artificial Intelligence student of department in Master’s degree graduate students of the year 2008.
  • Ranked in the top 1% of the Bachelor's degree participants in the National Universities Entrance Exam in summer 2001.
  • Ranked the top student during three years of high school educations in Mathematics and Physics.
Work Experience:
  • Machine Learning intern at Cadence Design Systems , summer 2017
  • Web developer, Analyzer, and Database Designer SADAD Informatics Corp., a company related to the largest commercial retail bank in Iran and in the Middle East, 2011-€“2014.
  • Web developer, Analyzer, and Database Designer, PadiD Pardaz Engineering Corp., 2009-€“2011.
  • Web and Windows developer, Atisun Engineering Corp., 2007-2009.
Teaching Assistantship:
Master's Thesis:
Incremental Nonparametric Weighted Feature Extraction for online Subspace Pattern Classification: In this study, a new online method based on nonparametric weighted feature extraction (NWFE) is proposed. NWFE was introduced to enjoy optimum characteristics of linear discriminant analysis (LDA) and nonparametric discriminant analysis (NDA) while rectifying their drawbacks. It emphasizes the points near decision boundary by putting greater weights on them and deemphasizes other points. Incremental nonparametric weighted feature extraction (INWFE) is the online version of NWFE. INWFE has advantages of NWFE method such as extracting more than L-1 features in contrast to LDA. It is independent of the class distribution and performs well in complex distributed data. The effects of outliers are reduced due to the nature of its nonparametric scatter matrix. Furthermore, it is possible to add new samples asynchronously, i.e. whenever a new sample becomes available at any given time, it can be added to the algorithm. This is useful for many real world applications since all data cannot be available in advance. This method is implemented on synthetic data, a number of UCI datasets and Indian Pine dataset . Results are compared with NWFE in terms of classification accuracy and execution time. For nearest neighbour classifier it is shown that this technique converges to NWFE at the end of learning process. In addition, the computational complexity is reduced in comparison with NWFE in terms of execution time. Here is the link.
Course Projects:
Efficient Training of MRFs with Latent Variable using Paired-Dual Learning (Spring 2016): In this project, we propose a framework that quickly trains Markov random fields with latent variables by avoiding repeated inferences. We used a variational learning objective that substitutes belief propagation dual problems for two corresponding inference problems, augmented with Bethe entropy. We evaluate gradients using incomplete dual inference optimization to avoid repeated, full inference. We demonstrate the effectiveness of the proposed method in the task of image segmentation on Scene Understanding dataset, showing that regarding training time, our approach is superior to EM and subgradient, converging faster to the optimal solution. Here is the report.
Digit Recognition (Spring 2015): In this project, we did optical character (specifically digit) recognition using various machine learning techniques. We worked on two datasets, one of them is standard MNIST dataset which is benchmark dataset and used in many papers for evaluation. The other one is The Street View House Numbers (SVHN) Dataset which is a real-world image dataset obtained from house numbers. We study and apply various feature extraction and classification methods on these two datasets and evaluate the results and compare them to the results published in respective papers. The feature extraction techniques that we explored were Sparse coding and Principal component analysis (PCA). The classification methods that we used were Nearest-Neighbor classifier, Quadratic classifier, Support Vector Machines (SVM), Multi Layered Neural Networks, and Convolutional Neural Network (CNN).
Purchase Prediction Using Online Trading Transactions (Fall 2014): In this project we predict the buyers needs and their trading behavior in order to design an efficient recommendation system. We focus on Allstate Insurance company dataset in Kaggle . Each day people receive many advertisements via online shopping websites. Each of the advertised items has a certain cost, brand, rating, etc.. If the eventual purchase can be predicted sooner, then the seller is less likely to lose its customers. In an insurance company case if the eventual purchase can be predicted sooner in the shopping window, the quoting process is shortened and the issuer is less likely to lose the customer business. The inputs of our system are: User/buyer information (i.e age, gender, marital status, etc.), Product/Seller information (i.e. Cost, ranking, brand, category of the items provided, etc.), and Quote history of User/buyer. The output is a set of products which are most likely to be purchased by User/buyer.
Probabilistic Optimal Pathing in Dynamic, Time-Sensitive Routing Networks (Fall 2014): Link prediction is fundamental in forecasting how some graph might change over time. We propose various classification approaches that attempt to model optimality of paths in graphs over time. This project includes large-scale data cleaning and preprocessing on adjacency lists for various internet traffic graphs following by complex analysis for feature extraction. Once a solid set of features was isolated, we performed rigorous classification techniques and modifications to standard logistic regression on a set of paths. These paths were at one point in time optimal on some of the graphs. Our goal was then to produce a distribution over optimality at some future time step. Our results indicate strong predictive ability in logistic regression with various feature approaches. We were able to beat our standard baseline of essentially choosing always predict absent and always predict present for each of the paths. We use the Kaggle competition hosted by Facebook for internet traffic data to get the dataset.