Multi-purpose password dataset generation and its application in decision making for password cracking through machine learning
Abstract
This article proposes a method for multi-purpose password dataset generation suitable for use in further machine learning and other research related, directly or indirectly, to passwords. Currently, password datasets are not suitable for machine learning or decision-driven password cracking. Most password datasets are just any old password dictionaries that contain only leaked and common passwords and no other information. Other password datasets are small and include only weak passwords that have previously been leaked. The literature is rich in terms of methods used for password cracking based on password datasets. Those methods are mainly focused on generating more password candidates like the ones included in the training dataset. The proposed method exploits statistical analysis of leaked passwords and randomness to ensure diversity in the dataset. An experiment with the generated dataset has shown significant improvement in time when performing dictionary attack but not when performing brute-force attack.
Keyword : passwords, password cracking, password dataset, password strength, machine learning
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Bansal, B. (2019). Password strength classifier dataset. Kaggle. https://www.kaggle.com/datasets/bhavikbb/password-strength-classifier-dataset
Bansal, S. (2021). 10000 most common passwords. Kaggle. https://www.kaggle.com/datasets/shivamb/10000-most-common-passwords
Bowes, R. (2008). Passwords – SkullSecurity. https://wiki.skullsecurity.org/index.php/Passwords
Craenen, R. (n.d.). Leet speak cheat sheet. Retrieved August 21, 2022, from https://www.gamehouse.com/blog/leet-speak-cheat-sheet/
Deng, G., Yu, X., & Guo, H. (2019). Efficient password guessing based on a password segmentation approach. In 2019 IEEE Global Communications Conference (GLOBECOM) (pp. 1–6). IEEE. https://doi.org/10.1109/GLOBECOM38437.2019.9013139
Devi, K. K., & Arumugam, S. (2019). Password cracking algorithm using probabilistic conjunctive grammar. In 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS) (pp. 1–4). IEEE. https://doi.org/10.1109/INCOS45849.2019.8951390
Grassi, P., Garcia, M., & Fenton, J. (2017). Digital identity guidelines: Revision 3. National Institute of Standards and Technology. https://doi.org/10.6028/NIST.SP.800-63-3
Hellman, M. E. (1980). A cryptanalytic time – memory trade-off. IEEE Transactions on Information Theory, 26(4), 401–406. https://doi.org/10.1109/TIT.1980.1056220
Hitaj, B., Gasti, P., Ateniese, G., & Perez-Cruz, F. (2017). PassGAN: A deep learning approach for password guessing. aXiv. https://doi.org/10.48550/arXiv.1709.00440
Kaspersky. (n.d.). Brute force attack: Definition and examples. Retrieved July 19, 2022, from https://www.kaspersky.com/resource-center/definitions/brute-force-attack
Kim, P., Lee, Y., Hong, Y.-S., & Kwon, T. (2021). A password meter without password exposure. Sensors, 21(2), 345. https://doi.org/10.3390/s21020345
Li, Z., Li, T., & Zhu, F. (2019). An online password guessing method based on big data. In Proceedings of the 2019 3rd International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (pp. 59–62). https://doi.org/10.1145/3325773.3325779
McMillan, R. (2012). The world’s first computer password? It was useless too. https://www.wired.com/2012/01/computer-password/
NordPass. (2021). Top 200 most common password list 2021. https://nordpass.com/most-common-passwords-list/
Oechslin, P. (2003). Making a faster cryptanalytic time-memory trade-off. In D. Boneh (Ed.), Lecture notes in computer science: Vol. 2729. Advances in Cryptology – CRYPTO 2003. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45146-4_36
Pleacher, D. (n.d.). Calculating password entropy. Retrieved February 16, 2023, from https://www.pleacher.com/mp/mlessons/algebra/entropy.html
Potter, B. (2005). Are passwords dead? Network Security, 2005(9), 7–8. https://doi.org/10.1016/S1353-4858(05)70280-4
scikit-learn. (n.d.). Metrics and scoring: quantifying the quality of predictions – scikit-learn 1.2.1 documentation. Retrieved February 16, 2023, from https://scikit-learn.org/stable/modules/model_evaluation.html#accuracy-score
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Sipser, M. (2012). Introduction to the theory of computation. Cengage Learning.
Szczepanek, A. (2021). Password entropy calculator. https://www.omnicalculator.com/other/password-entropy
Tatli, E. I. (2015). Cracking more password hashes with patterns. IEEE Transactions on Information Forensics and Security, 10(8), 1656–1665. https://doi.org/10.1109/TIFS.2015.2422259
Ur, B., Noma, F., Bees, J., Segreti, S. M., Shay, R., Bauer, L., Christin, N., & Cranor, L. F. (2015). “I Added ‘!’ at the End to Make It Secure”: Observing password creation in the lab. In SOUPS 2015 proceedings. USENIX.
Weir, M., Aggarwal, S., Medeiros, B. de, & Glodek, B. (2009). Password cracking using probabilistic context-free grammars. In 2009 30th IEEE Symposium on Security and Privacy (pp. 391–405). IEEE. https://doi.org/10.1109/SP.2009.8
Yu, F., & Huang, Y. (2015). An overview of study of passowrd cracking. In 2015 International Conference on Computer Science and Mechanical Automation (CSMA) (pp. 25–29). IEEE. https://doi.org/10.1109/CSMA.2015.12