Browse Prior Art Database

Black-Box Automated Detection of Malicious Content in Web Applications

IP.com Disclosure Number: IPCOM000192678D
Original Publication Date: 2010-Jan-28
Included in the Prior Art Database: 2010-Jan-28
Document File: 2 page(s) / 30K

Publishing Venue

IBM

Abstract

Disclosed is a technique to automatically identify malicious content being served on or linked off legitimate web pages. The technique includes automatically traversing these websites using a web scanner, passing all the downloaded content through an antivirus or similar tool to identify malicious content, and matching all discovered links against a database of black-listed locations.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 49% of the total text.

Page 1 of 2

Black-Box Automated Detection of Malicious Content in Web Applications

Disclosed is a technique to automatically identify malicious content being served on or linked from legitimate web pages. The disclosed technique includes automatically traversing websites associated with the legitimate web pages using a web scanner,

                              assing all the downloaded content through an antivirus tool or similar tool to identify malicious content, and matching all discovered links against a database of black-listed locations.

One of the biggest and fastest growing delivery methods for malicious software (malware) is distribution via Web applications. These Web applications serve both malicious binaries that require some form of user acceptance, for example, in the form of an executable or browser add-ons, and malicious JavaScript or page content that tries to exploit security holes in the browser, ActiveX, operating system or other environments. Attackers then attempt to lure users into browsing the pages holding the malicious software, which starts the attacks.

This "luring" activity may be performed in various ways, such as spam emails, and sites with dubious content, but a primary method is to link to the malicious pages from legitimate websites. The linking occurs, ideally in a way that does not require any user action, such as embedding an inline frame on a

p

age, or injecting an image tag. In some instances a hacker may compromise legitimate websites,

the malware on the legit website itself, using the posting as an attack delivery vehicle.

Two types of solutions used today are runtime protection devices and virus scanners. Runtime protection devices, such as intrusion prevention systems (IPS), attempt to detect malicious software being returned from a server. A problem with this solution is that runtime protection devices can afford to spend very little time on each component, forcing the runtime protection device to limit analysis to known signatures. Any behavioral analysis, execution simulation or other deep analysis techniques are much too slow to be useful in this mode of operation. Another problem is that runtime protection devices do not parse the page fully due to the same time and resource constraints. Therefore runtime protection devices do not see all the links on a site. The runtime protection devices cannot determine whether the links are blacklisted, nor can the links be followed to determine whether the links lead to malicious content.

Virus scanners on the server side attempt to discover malicious software being hosted on the server. Scanners often do not have access to data stored in non-standard ways, such as data being stored in custom database tables and possibly encrypted, data created in real time from multiple components as in a mash-up style, or retrieved from remote locations. Virus scanners cannot analyze data that cannot be accessed, and therefore are limited in coverage.

In Web 2...