DETECTION OF PHISHING WEB PAGES BY COMBINING SEMANTICAL AND VISUAL INFORMATION
View/ Open
Date
2024-04-17Author
Almakhamreh, Ahmad Hani Abdalla
xmlui.dri2xhtml.METS-1.0.item-emb
Acik erisimxmlui.mirage2.itemSummaryView.MetaData
Show full item recordAbstract
The increased frequency and sophistication of cybercrimes resulted in severe monetary loss for individuals and entities, increasing the demand for robust and sustainable solutions. Although there are countless anti-phishing solutions in the domain, cybercriminals exploit these systems and bypass them with zero-day attacks. In this dissertation, a new end-to-end deep learning model called CrossPhire is proposed, which uses semantic and visual features to make machine learning-based classification between phishing and legitimate web pages. CrossPhire extracts distinctive features from three different data environments, including URLs, source code, and screenshots obtained from web pages, and is jointly trained. In this work, we present the following novelties: (1) development of an end-to-end deep learning model capable of capturing semantic and visual features from the page's URL, plain textual content, and screenshot, (2) a language-independent analysis approach, leveraging SOTA sentence transformers and convolutional neural networks, enabling analysis without reliance on third-party services, (3) a new highly diverse multimedia dataset compiling real-world examples of legitimate and phishing web pages, called Phish360, (4) provision of statistical reports based on extensive data analysis of Phish360 and other multimodal datasets in the literature, and (5) conducting comprehensive experiments, including in-data and cross-data validation across five different datasets to evaluate the generalization performance of the proposed model.
A comprehensive series of experiments was conducted to identify the most effective model configuration by exploring various combinations of (a) HTML parsers (BeautifulSoup and Trafilatura), (b) sentence transformers (Sentence-BERT and multilingual XLM-R), and (c) convolutional image classifier models (ResNet50 and DenseNet121). In the experiments, CrossPhire demonstrated outstanding performance, achieving 99.21% accuracy on the Phish360 dataset and maintaining an average accuracy of 99.26% across the four benchmark datasets. Additionally, we fine-tuned the CLIP model using the available benchmark datasets by integrating a two-hidden layer MLP. Our approach demonstrated superior results compared to CLIP, consistently outperforming it across all employed datasets. These findings establish CrossPhire as a highly effective solution across various scales and datasets, surpassing existing approaches.