How to Track Botnets

In this section we introduce our methodology to track and observe botnets with the help of honeypots. Tracking botnets is clearly a multi-step operation: First one needs to gather some data about an existing botnets. This can for example be obtained via an analysis of captured malware. Afterwards one can hook a client in the networks and gather further information. In the first part of this section we thus want to introduce our techniques to retrieve the necessary information with the help of honeypots. And thereafter we present our approach in observing botnets.

Getting information with the help of honeynets

As stated before, we need some sensitive information from each botnet that enables us to place a fake bot into a botnet. The needed information include:

  • DNS/IP-address of IRC server and port number
  • (optional) password to connect to IRC-server
  • Nickname of bot and ident structure
  • Channel to join and (optional) channel-password.

Using a GenII Honeynet containing some Windows honeypots and snort_inline enables us to collect this information. We deployed a typical GenII Honeynet with some small modifications as depicted in the next figure:

Tracking Botnets - Honeynet Setup Figure

The Windows honeypot is an unpatched version of Windows 2000 or Windows XP. This system is thus very vulnerable to attacks and normally it takes only a couple of minutes before it is successfully compromised. It is located within a dial-in network of a German ISP. On average, the expected lifespan of the honeypot is less than ten minutes. After this small amount of time, the honeypot is often successfully exploited by automated malware. The shortest compromise time was only a few seconds: Once we plugged the network cable in, an SDBot compromised the machine via an exploit against TCP port 135 and installed itself on the machine.

As explained in the previous section, a bot tries to connect to an IRC server to obtain further commands once it successfully attacks one of the honeypots. This is where the Honeywall comes into play: Due to the Data Control facilities installed on the Honeywall, it is possible to control the outgoing traffic. We use snort_inline for Data Control and replace all outgoing suspicious connections. A connection is suspicious if it contains typical IRC messages like " 332 ", " TOPIC ", " PRIVMSG " or " NOTICE ". Thus we are able to inhibit the bot from accepting valid commands from the master channel. It can therefore cause no harm to others - we have caught a bot inside our Honeynet. As a side effect, we can also derive all necessary sensitive information for a botnet from the data we have obtained up to that point in time: The Data Capture capability of the Honeywall allows us to determine the DNS/IP-address the bot wants to connect to and also the corresponding port number. In addition, we can derive from the Data Capture logs the nickname and ident information. Also, the server's password, channel name as well as the channel password can be obtained this way. So we have collected all necessary information and the honeypot can catch further malware. Since we do not care about the captured malware for now, we rebuild the honeypots every 24 hours so that we have "clean" systems every day. The German Honeynet Project is also working on another project - to capture the incoming malware and analyzing the payload - but more on this in a later section.

Observing Botnets

Now the second step in tracking botnets takes place, we want to re-connect into the botnet. Since we have all the necessary data, this is not very
hard. In a first approach, you can just setup an irssi (console based IRC client) or some other IRC client and try to connect to the network. If the network is relatively small (less then 50 clients), there is a chance that your client will be identified since it does not answer to valid commands. In this case, the operators of the botnets tend to either ban and/or DDoS the suspicious client.
To avoid detection, you can try to hide yourself. Disabling all auto response triggering commands in your client helps a bit: If your client replies to a
"CTCP VERSION" message with "irssi 0.89 running on openbsd i368" then the attacker who requested the Client-To-Client Protocol (CTCP) command will get suspicious. If you are not noticed by the operators of the botnets, you can enable logging of all commands and thus observe what is happening.

But there are many problems if you start with this approach: Some botnets use very hard stripped down IRCds which are not RFC compliant so that a normal IRC client can not connect to this network. A possible way to circumvent this situation is to find out what the operator has stripped out, and modify the source code of your favorite client to override it. Almost all current IRC clients lack well written code or have some other disadvantages. So probably you end up writing your own IRC client to track botnets. Welcome to the club - ours is called drone. There are some pitfalls that you should consider when you write your own IRC client. Here are some features that we found useful in our dedicated botnet tracking IRC client:

  • SOCKS v4 Support
  • Multi-server Support:
    If you don't want to start an instance of your software for each botnet you
    found, this is a very useful feature.
  • No Threading: Threaded software defines hard to debugging Software.
  • Non-blocking connecting and DNS resolve
  • poll(): Wait for some event on a file descriptor using non blocking I/O we needed an multiplexer, select() could have done the job, too
  • libadns: This is a asynchronous DNS resolving library. Looking up hostnames does not block your code even if the lookup takes some time. Necessary if one decides not to use threads.
  • Written in C++ since OOP offers many advantages writing a Multi-server client
  • Modular interface so you can un/load (C++) modules at runtime
  • libcurl: This is a command line tool for transferring files with URL syntax, supporting many different protocols. libcurl is a library offering the same features as the command line tool.
  • Perl Compatible Regular Expressions (PCRE): The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE enable our client to guess the meaning of command and interact in some cases in a "native" way.
  • Excessive debug-logging interface so that it is possible to get information about RFC non-compliance issues very fast and fix them in the client (side note: One day logging 50 botnets can give more than 500 MB of debug information).

Drone is capable of using SOCKS v4 proxies so we do not run into problems if it's presence is noticed by an attacker in a botnet. The SOCKS v4 proxies are on dial-in accounts in different networks so that we can easily change the IP addresses. Drone itself runs on a independent machine we maintain ourselves. We want to thank all the people contributing to our project by donating shells and/or proxies.
Some Anti-virus vendors publish data about botnets. While useful, this information may at times not be enough to to effectively track botnets, as we
demonstrate in Botnet Vendors.

Sometimes the owners of the botnet will issue some commands to instruct his bots. We present the more commonly used commands in the last section. Using our approach, we are able to monitor the issued commands and learn more about the motives of the attackers. To further enhance our methodology, we tried to write a PCRE-based emulation of a bot so that our dummy client could even correctly reply to a given command. But we soon minimized our design goals here because there is no standardization of botnet commands and the attackers tend to change their commands very often. In many cases, command-replies are even translated to their mother language.

When you monitor more than a couple of networks, begin to check if some of them are linked, and group them if possible. Link-checking is easy, just join a specific channel on all networks and see if you get more than one client there. It is surprising how many networks are linked. People tend to set up a DNS-name and channel for every bot version they check out. To learn more about the attacker, try putting the attacker's nickname into a Google search and often you will be surprised how much information you can find. Finally, check the server's Regional Internet Registries (RIR) entry (RIPE NCC, ARIN, APNIC, and LACNIC) to even learn more about the attacker.