Password Project

Leaked passwords from data breaches can pose a serious threat if users reuse or slightly modify the passwords for other services. With more and more online services getting breached today, there is still a lack of large-scale quantitative understanding of the risks of password reuse and modification. In this paper, we perform the first large-scale empirical analysis of password reuse and modification patterns using a ground-truth dataset of 28.8 million users and their 61.5 million passwords in 107 services over 8 years.

Findings

We find that password reuse and modification is a very common behavior (observed on 52% of the users). More surprisingly, sensitive online services such as shopping websites and email services received the most reused and modified passwords. We also observe that users would still reuse the already-leaked passwords for other online services for years after the initial data breach. Finally, to quantify the security risks, we develop a new training-based guessing algorithm. Extensive evaluations show that more than 16 million password pairs (30% of the modified passwords and all the reused passwords) can be cracked within just 10 guesses. The result suggests that more proactive mechanisms are needed to protect user accounts after major data breaches.

Data Sharing

To facilitate future research, we want to share the password dataset with the research community. At the same time, to prevent the dataset from being misused, we follow a conservative data sharing policy. First, we remove the email addresses and replace them with pseudo identifiers. Second, we remove the names of the breached online services from the dataset.

This sample file consists of 1000 random users and their password records across different services. Each line represents a user record: [userID]\t[password₁]\t[password₂]...[password_n]

Download Data Sample

We are currently cleaning up the full dataset, which will be available after the conference. If you wish to access more data, please email us at hanghu[AT]vt.edu and include the following information:
1. Briefly describe your position and research institution.
2. Briefly describe your plan of use of our dataset.

If you would like to use the dataset for your research, please cite the paper below:

The Next Domino To Fall: Empirical Analysis of User Passwords across Online Services

Chun Wang, Steve T.K. Jan, Hang Hu, Douglas Bossart, Gang Wang.

In Proceedings of The ACM Conference on Data and Applications Security and Privacy (CODASPY). Tempe, AZ, March 2018.

PDF Bibtex