Multi-purpose password dataset generation and its application in decision making for password cracking through machine learning

Mark Vainer

doi:10.3846/ntcs.2023.17639

DOI: https://doi.org/10.3846/ntcs.2023.17639

Abstract

This article proposes a method for multi-purpose password dataset generation suitable for use in further machine learning and other research related, directly or indirectly, to passwords. Currently, password datasets are not suitable for machine learning or decision-driven password cracking. Most password datasets are just any old password dictionaries that contain only leaked and common passwords and no other information. Other password datasets are small and include only weak passwords that have previously been leaked. The literature is rich in terms of methods used for password cracking based on password datasets. Those methods are mainly focused on generating more password candidates like the ones included in the training dataset. The proposed method exploits statistical analysis of leaked passwords and randomness to ensure diversity in the dataset. An experiment with the generated dataset has shown significant improvement in time when performing dictionary attack but not when performing brute-force attack.

Keyword : passwords, password cracking, password dataset, password strength, machine learning

How to Cite

Vainer, M. (2023). Multi-purpose password dataset generation and its application in decision making for password cracking through machine learning. New Trends in Computer Sciences, 1(1), 1–18. https://doi.org/10.3846/ntcs.2023.17639

Published in Issue

Apr 11, 2023

Abstract Views

871

PDF Downloads

650

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

Bachmann, M. (2014). Passwords are dead: Alternative authentication methods. In 2014 IEEE Joint Intelligence and Security Informatics Conference (pp. 322–322). IEEE. https://doi.org/10.1109/JISIC.2014.67

Bansal, B. (2019). Password strength classifier dataset. Kaggle. https://www.kaggle.com/datasets/bhavikbb/password-strength-classifier-dataset

Bansal, S. (2021). 10000 most common passwords. Kaggle. https://www.kaggle.com/datasets/shivamb/10000-most-common-passwords

Bowes, R. (2008). Passwords – SkullSecurity. https://wiki.skullsecurity.org/index.php/Passwords

Craenen, R. (n.d.). Leet speak cheat sheet. Retrieved August 21, 2022, from https://www.gamehouse.com/blog/leet-speak-cheat-sheet/

Deng, G., Yu, X., & Guo, H. (2019). Efficient password guessing based on a password segmentation approach. In 2019 IEEE Global Communications Conference (GLOBECOM) (pp. 1–6). IEEE. https://doi.org/10.1109/GLOBECOM38437.2019.9013139

Devi, K. K., & Arumugam, S. (2019). Password cracking algorithm using probabilistic conjunctive grammar. In 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS) (pp. 1–4). IEEE. https://doi.org/10.1109/INCOS45849.2019.8951390

Grassi, P., Garcia, M., & Fenton, J. (2017). Digital identity guidelines: Revision 3. National Institute of Standards and Technology. https://doi.org/10.6028/NIST.SP.800-63-3

Hellman, M. E. (1980). A cryptanalytic time – memory trade-off. IEEE Transactions on Information Theory, 26(4), 401–406. https://doi.org/10.1109/TIT.1980.1056220

Hitaj, B., Gasti, P., Ateniese, G., & Perez-Cruz, F. (2017). PassGAN: A deep learning approach for password guessing. aXiv. https://doi.org/10.48550/arXiv.1709.00440

Kaspersky. (n.d.). Brute force attack: Definition and examples. Retrieved July 19, 2022, from https://www.kaspersky.com/resource-center/definitions/brute-force-attack

Kim, P., Lee, Y., Hong, Y.-S., & Kwon, T. (2021). A password meter without password exposure. Sensors, 21(2), 345. https://doi.org/10.3390/s21020345

Li, Z., Li, T., & Zhu, F. (2019). An online password guessing method based on big data. In Proceedings of the 2019 3rd International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (pp. 59–62). https://doi.org/10.1145/3325773.3325779

McMillan, R. (2012). The world’s first computer password? It was useless too. https://www.wired.com/2012/01/computer-password/

NordPass. (2021). Top 200 most common password list 2021. https://nordpass.com/most-common-passwords-list/

Oechslin, P. (2003). Making a faster cryptanalytic time-memory trade-off. In D. Boneh (Ed.), Lecture notes in computer science: Vol. 2729. Advances in Cryptology – CRYPTO 2003. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45146-4_36

Pleacher, D. (n.d.). Calculating password entropy. Retrieved February 16, 2023, from https://www.pleacher.com/mp/mlessons/algebra/entropy.html

Potter, B. (2005). Are passwords dead? Network Security, 2005(9), 7–8. https://doi.org/10.1016/S1353-4858(05)70280-4

scikit-learn. (n.d.). Metrics and scoring: quantifying the quality of predictions – scikit-learn 1.2.1 documentation. Retrieved February 16, 2023, from https://scikit-learn.org/stable/modules/model_evaluation.html#accuracy-score

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

Sipser, M. (2012). Introduction to the theory of computation. Cengage Learning.

Szczepanek, A. (2021). Password entropy calculator. https://www.omnicalculator.com/other/password-entropy

Tatli, E. I. (2015). Cracking more password hashes with patterns. IEEE Transactions on Information Forensics and Security, 10(8), 1656–1665. https://doi.org/10.1109/TIFS.2015.2422259

Ur, B., Noma, F., Bees, J., Segreti, S. M., Shay, R., Bauer, L., Christin, N., & Cranor, L. F. (2015). “I Added ‘!’ at the End to Make It Secure”: Observing password creation in the lab. In SOUPS 2015 proceedings. USENIX.

Weir, M., Aggarwal, S., Medeiros, B. de, & Glodek, B. (2009). Password cracking using probabilistic context-free grammars. In 2009 30th IEEE Symposium on Security and Privacy (pp. 391–405). IEEE. https://doi.org/10.1109/SP.2009.8

Yu, F., & Huang, Y. (2015). An overview of study of passowrd cracking. In 2015 International Conference on Computer Science and Mechanical Automation (CSMA) (pp. 25–29). IEEE. https://doi.org/10.1109/CSMA.2015.12