Browse Prior Art Database

An improvement to Honeypot CAPTCHAs to stop subversion by image recognition

IP.com Disclosure Number: IPCOM000236653D
Publication Date: 2014-May-07
Document File: 4 page(s) / 74K

Publishing Venue

The IP.com Prior Art Database

Abstract

This article describes a method of improving Honeypot CAPTCHAs to help detect bots filling out forms. Currently bots are able to get around Honeypot CAPTCHAs through the use of image recognition to determine which fields it should fill out. Described is a method to prevent bots from being able to do this by improving the way in which Honeypot CAPTCHAs are used.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 41% of the total text.

Page 01 of 4

An improvement to Honeypot CAPTCHAs to stop subversion by image recognition

Honeypot CAPTCHAs are designed to beat spam bots by adding hidden text boxes to forms. This works because spam bots aren't very good at detecting the CSS of a form and so will enter information into every text box without knowing if it is hidden or visible to a human user. When the form is submitted the server can check if the hidden fields are filled in, and if they are, it will know a spam bot has filled in the form and can disregard the data.

    To overcome this a spam bot can utilise image recognition and Optical Character Reader(OCR) technology. The bot takes a screenshot of the form it intends to fill and processes it with an OCR. The OCR generates an internal representation of the form in key/value pairs where the key is an on screen label and the value is its related User Input Element (UIE). This internal representation is then analysed and compared to the actual HTML of the webpage.

    The comparison will take the label's on screen value and locate it in the HTML to identify the appropriate location and then search for the next viable UIE. It detects the next viable UIE by analysing what input fields there are between the current and the next label. There are two potential outcomes:

If it only finds one UIE it knows it must be the correct element for the user data


1.

and fills it in.

If it finds multiple fields it can assume that at least one is a honeypot and will do


2.

focussed CSS analysis to determine the correct one. This is a more advanced technique as it only has to locate and compare the relevant CSS styling for the elements it is analysing, rather than building a full internal picture.

Figure 1: The process of a spam bot running image recognition on a Honeypot protected form

This beats the honeypot because the bot only fills in form components that


Page 02 of 4

are visible to humans and so its spam input will be processed as if it were human. This is a huge problem for websites that use honeypot CAPTCHAs because it compromises the validity and security of data submitted to their servers.

    One possible way to defend against this is to abandon the honeypot CAPTCHA approach and use another CAPTCHA technique such as Re-CAPTCHA. This is not a desirable outcome as the reason for using the Honeypot CAPTCHA is to create a very low friction user experience whilst still validating that they are human. Adding Re-CAPTCHA defeats the bots, but introduces high friction verification, a problem that the honeypot is designed to avoid.

    This problem is solved by dynamically altering the form after a screen capture has been performed by a bot by adding a new, visible, field. This screen capture event can be detected at a programmatic level. This new field will act as a visible honeypot - the field will not exist in the key/pair data generated from the capture/OCR combination and so the bot will think it is a hidden field(honeypot) that it should not fill out. A h...