Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

Authors: 

Yun Lin and Ruofan Liu, National University of Singapore; Dinil Mon Divakaran, Trustwave; Jun Yang Ng and Qing Zhou Chan, National University of Singapore; Yiwen Lu, Yuxuan Si, and Fan Zhang, Zhejiang University; Jin Song Dong, National University of Singapore

Abstract: 

Recent years have seen the development of phishing detection and identification approaches to defend against phishing attacks. Phishing detection solutions often report binary results, i.e., phishing or not, without any explanation. In contrast, phishing identification approaches identify phishing webpages by visually comparing webpages with predefined legitimate references and report phishing along with its target brand, thereby having explainable results. However, there are technical challenges in visual analyses that limit existing solutions from being effective (with high accuracy) and efficient (with low runtime overhead), to be put to practical use.

In this work, we design a hybrid deep learning system, Phishpedia, to address two prominent technical challenges in phishing identification, i.e., (i) accurate recognition of identity logos on webpage screenshots, and (ii) matching logo variants of the same brand. Phishpedia achieves both high accuracy and low runtime overhead. And very importantly, different from common approaches, Phishpedia does not require training on any phishing samples. We carry out extensive experiments using real phishing data; the results demonstrate that Phishpedia significantly outperforms baseline identification approaches (EMD, PhishZoo, and LogoSENSE) in accurately and efficiently identifying phishing pages. We also deployed Phishpedia with CertStream service and discovered 1704 new real phishing websites within 30 days, way more than other solutions; moreover, 1113 of them are not reported by any engines in VirusTotal.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {272200,
author = {Yun Lin and Ruofan Liu and Dinil Mon Divakaran and Jun Yang Ng and Qing Zhou Chan and Yiwen Lu and Yuxuan Si and Fan Zhang and Jin Song Dong},
title = {Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages},
booktitle = {30th USENIX Security Symposium (USENIX Security 21)},
year = {2021},
isbn = {978-1-939133-24-3},
pages = {3793--3810},
url = {https://www.usenix.org/conference/usenixsecurity21/presentation/lin},
publisher = {USENIX Association},
month = aug
}

Presentation Video