Browse Prior Art Database

Detecting Phishing Sites by Analyzing Similarities across Sites

IP.com Disclosure Number: IPCOM000247198D
Publication Date: 2016-Aug-16
Document File: 2 page(s) / 45K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method to detect phishing sites. Because new phishing websites are often a variation of old phishing websites, the novel technique is to capture the similarities between the web pages that contain FORMS from the phishing websites, and then use that information to detect other phishing websites.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 2

Detecting Phishing Sites by Analyzing Similarities across Sites

Creators of phishing websites either modify and then republish an existing phishing website or use tools that allow the creation of new phishing websites. Every phishing website has at least one formthat the user is require to populate; this is the mechanism that attackers se to steal data from the user.

Detecting phishing sites is a well-known task; however, an improved method is needed for automatic phishing site detection.

Because new phishing websites are often a variation of old phishing websites, the novel technique is to capture the similarities between the web pages that contain formsfrom the phishing websites, and then use that information to detect other phishing

websites. The approach is to build a knowledge base of the appearances of the existing phishing websites (considering just the page that contains the form) and then use that stored information to detect new phishing websites.

Pseudocode

TRAINING PHASE: (off-line)


1. Build a database of phishing websites


2. For each Phishing Website:

A. Locate the page that contains at least one FORM element


B. Count the number of occurrences of each Hypertext Markup Language (HTML) tag on that page Combine the result as a vector ordered alphabetically by tag name

    C. Save the vector as the signature of the current phishing website
3. Perform a clustering on all the signatures of the phishing websites.

4. Save all the clusters, training done

TESTING PHASE (online)


5. For each new testing website

A. Locate the page that contains at least one FORM element


B. Count the number of occurrences of each HTML tag on that page


C. Combine the result as a vector ordered alphabetically by tag name

D....