Google Summer of Code 2014 Project Ideas

This page contains a list of potential project ideas that we are keen to develop during GSoC 2014 (we also have additional project ideas currently undergoing internal review, which will be added here once project deliverables and available mentors have been confirmed). You can view our previous GSoC 2009, GSoC 2010 , GSoC 2011 , GSoC 2012 and GSoC 2013 project ideas pages if you are looking for inspiration, or you might like to work on one of our existing tools, rather than working on something new.

We are always also interested in hearing any ideas for additional relevant computer security and honeynet-related R&D projects (although remember that to qualify for receiving GSoC funding from Google your project deliverables need to fit in to GSoC's 3-month project timescales!). If you have a suitable and interesting project, we'll always try and find the right resources to mentor it and support you. Please note - even if you aren't an eligible GSoC student, we are also always looking for general volunteers who are enthusiastic and interested in getting involved in honeynet R&D.

Each sponsored GSoC 2014 project will have one or more mentors available to provide a guaranteed contact point to students, plus one or more technical advisors to help applicants with the technical direction and delivery of the project (often the original author of a tool or its current maintainer, and usually someone recognised as an international expert in their particular field). Our Google Summer of Code organisational administrators will also be available to all sponsored GSoC students for general advice and logistical support. We'll also provide supporting hosted svn/trac/git/redmine/mailman/IRC/etc project infrastructure, if required.

For all questions about the Honeynet Project, the GSoC program or our projects, please contact us on #gsoc-honeynet on irc.freenode.net, subscribe to our public mailing list for people interested in GSoC at https://public.honeynet.org/mailman/listinfo/gsoc or email us directly at project@honeynet.org.

To learn more about the Google Summer of Code event, see the the GSoC 2014 Website.

R&D Focus Areas

In previous years our internal honeynet R&D focus was primarily directed into a number of priority areas, which were:

  • Mobile device honeypots
  • Virtualization honeypots / monitoring / attacks
  • Topical malware (e.g. stuxnet SCADA, attacks against mobile platforms such as Android, etc)
  • Active defense research (e.g. botnet take down in an ethical manner)
  • IPv6 honeynets
  • Distributed data collection, analysis and visualisation

So unsurprisingly a number of our suggested potential project ideas fall into these research areas. However, we are also interested in receiving project proposals and tool updates/new tool developments outside these research focus areas too, so hopefully this provides potential students with a wide variety of exciting topics to contributed to and be engaged with once again this summer.

GSoC 2014 Project Ideas

(more project ideas and mentors to follow, once internal review is complete)

GSoC 2014 Project Ideas

Name: Project 1 - Wire’n’Sics Plugins (aka: WireShnork reloaded)
Mentor: Guillaume Arcas (FR)
Backup mentor: Sébastien Larinier (FR)
Skills required: TCP/IP, C, Python, LUA
Project type: Reegnineering old tools
Project goal: Extending Wireshark Network Forensics capabilities with plugins
Description: Wireshark (and the CLI tshark utility) is a great network analyzer that is often useful during network security and forensics operations. A previous successful GSoC 2011 project was dedicated to extending and enhancing Wireshark with some additional plugins such as WireShnork. Since then, Wireshark internals have evolved, so the plugins written for GSoC 2011 are now deprecated.

The goal of this project is to re-engineer some of these plugins and to make them compatible with current and future wireshark releases. That can be done:
- by integrating the plugins in wireshark core engine or using the LUA scripting language.
- by taking advantage of the PCAPNG format that supports adding tags directly into a pcap packet capture file that will later be opened with Wireshark. As an example, that could be carried out by adding PCAPNG support to Snort and then letting Snort write its output as tags directly in the pcap file.

Name: Project 2 - Droidbox native introspection
Mentor: Patrik Lantz (SE)
Backup mentor: Felix Leder (DE/NO)
Skills required: C, Java internals/JNI knowledge, Android malware
Project type: Improve existing tool
Project goal: The goal of the project is to add native introspection and monitoring capabilities to Droidbox. With these techniques, it is possible to generically monitor malware across Android versions without having to patch the actual APKs.
Description:
Droidbox is a dynamic malware analysis system (a.k.a. Sandbox) for Android based malware that was originally developed during GSoC 2011. One of the initial challenges of Droidbox has been that it required adjustments for each new Android OS version released. A degree of portability was added by patching the Android apps before they are run in Droidbox by a later GSoC project. This is an approach that is also followed in other Android sandbox solutions. However, patching Android apps can be detected and used as an evasion technique by system attackers.

One solution to both keep compatibility and at the same time avoid detection is to move the introspection functionality further down into the virtual machine. The Dalvik VM exposes most of it’s functionality via the “JNI” interfaces. In this project we want to “harden” Droidbox with such introspection, hopefully making it more difficult for attackers to prevent malicious Android applications from being monitored.

References:
[1] http://www.mulliner.org/android/feed/mulliner_ddi_summercon2013.pdf

Name: Project 3 - Droid-BOT
Mentor: Felix Leder (DE/NO)
Backup mentor: Patrik Lantz (SE>
Skills required: Python, C
Project type: New Tool to Improve existing tool
Project goal: The Droid-BOT is a virtual user for Android devices. The user’s goal is to interact well enough with potentially malicious app so that they show their “real face”.
Description:
Malicious Android apps are more and more often hiding their malicious payload behind user actions. These can be simple “OK” dialogues or fake video players. The motivation behind this is that the malicious apps want to avoid detections in sandbox/malware analysis environments that traditionally only provide passive observation and instrumentation, but not real user activity. By checking for simple user behavior, a malicious Android app can tell the difference between interacting with a human compared to idling inside a sandbox. That potentially allows it to alter it's behaviour and avoid detection.
With the Droid-BOT project, we want to create a virtual user that interacts with malicious Android apps, so that the apps are encouraged to start executing their true malicious behavior. The initial goal is to implement this for Droidbox, a dynamic malware analysis system (a.k.a. Sandbox) for Android based malware that was originally developed during GSoC 2011 and updated in GSoC 2012. By designing Droid-BOT as a framework, it should also be possible to use this approach in other analysis environments too.

References:
http://copperdroid.isg.rhul.ac.uk/copperdroid/about.php

Name: Project 4 - mitmproxy
Mentor: Maximilian Hils (DE)
Backup mentor: TBC (TBC)
Skills required: Python, TCP/IP, HTTP, HTML5/JS/CSS3
Project type: Improve existing tool
Project goal: Improve mitmproxy, in particular its web interface (formerly HoneyProxy)
Description:
mitmproxy is a man-in-the-middle SSL-capable HTTP proxy. It is an interactive console program written in Python that allows HTTP network traffic flows to be inspected and edited on the fly. With it's next release, mitmproxy is going to have gain a web interface that was originally developed as a GSoC project in 2012 and 2013. Our long-term goal is to achieve feature-parity between the web-interface and the console application on most parts. The goal of this project is to accelerate the process by adding new features to the web interface and improving the existing application functionality.

Name: Project 5 - Conpot: ICS/SCADA honeypot
Mentor: Lukas Rist (DE)
Backup mentor: Johnny Vestergaard (DK)
Skills required: Python, TCP, (HTTP, FTP, modbus, snmp, dnp3 and IEC 60870 an advantage)
Project type: Improve existing tool
Project goal: Conpot is an ICS honeypot with the goal to collect intelligence about the motives and methods of adversaries targeting industrial control systems. In this project we want to add additional protocols, improve the existing protocols, data logging, system and vulnerability emulation and overall infrastructure virtualization.
Description:  Until now setting up an Industrial Control System (ICS) honeypot required substantial manual work, real physical systems which are usually either inaccessible or expensive and learning about quite tedious protocol specifications. By implementing a master server for a larger set of common industrial communication protocols and virtual slaves which are easy to configure, we provide an easy entry into the analysis of security threats against industrial infrastructures and control systems.
A student applying for this project has to be open to learn new network protocols and adopt then modify existing implementations. This includes also automated testing and continuous integration, management of sensor deployments and data analysis. As this field is quite young and unexplored it will provide a large variety of challenges to solve.

The Conpot project can be found here: glastopf/conpot

Name: Project 6 - YAPDNS (Yet Another Passive DNS)
Project Name:
Mentor: Pietro Delsante (IT)
Backup mentor: Andrea De Pasquale (IT)
Skills required: Python, Django, HTML/JavaScript, PostgreSQL/MySQL
Project type: New tool
Project goal: Collect Passive DNS data from various sources; display, correlate and analyze them.
Description: There are a number of existing tools to collect Passive DNS data (e.g.>passivedns by gamelinux andpdnsd), but these tools generally only work by sniffing authoritative DNS answers within network traffic and by storing them. There are a huge amount of additional sources that could be used to collect Passive DNS data: for example, almost every organization has a web proxy, and its logs almost always contain a domain name, an IP address and a timestamp. The same data set can be extracted from other textual logs from DNS servers (Bind, Microsoft DNS, etc), web servers, IDS/IPS, and even sandboxes (Cuckoo) and honeypots (Thug) or other Passive DNS databases (VirusTotal, DNSDB, etc).

YAPDSN should provide an interface (e.g. aSyslog-NG local destination) to collect basic assiciations between an IP address and a domain name, along with the first and last time the association was seen. Other data can be added for specific log sources (e.g. DNS logs also contain TTL, record type, etc), or gathered from external repositories (e.g. association with malware in VirusTotal’s database, etc).
YAPDNS should also provide an interface with a search engine, a set of dashboards and some correlation rules (e.g. track by ASN, geolocation, fast-flux behaviour, etc). The tool should also provide some REST-like APIs to facilitate integration with other tools.

YAPDNS should also use the Honeynet Project's existing HPFeeds and HPFriends systems to facilitate easy data sharing between various trusted entities.

Name: Project 7 - Exploit Kit Forensics Framework
Mentor: Pietro Delsante (IT)
Backup mentor: Andrea De Pasquale (IT)
Skills required: Python, Django, HTML/JavaScript, PostgreSQL/MySQL
Project type: New tool
Project goal: Build a framework to facilitate forensic analysis of infections caused by exploit kits.
Description: Imagine you have a web proxy (or something able to reconstruct and log HTTP requests from live network traffic, such as Bro IDS) generating logs. You want to be able to understand when one of your internal client systems gets compromised by an exploit kit.

Simply looking for PE executables and trying to download them from their original URL won’t always work. For example, because the Exploit Kit requires the user to go through a complete chain of redirections before serving its payload to you.

The idea behind the Exploit Kit Forensics Framework is that of creating an automated process that monitors proxy logs, detects when a dangerous file has been downloaded (e.g. by looking at the content type, or by correlating information from an IDS such as Snort). The process would then analyze the HTTP Referrer field from the logs to “rewind the tape” up to the exploit kit's entry point, which could then be passed to a client honeypot such as (Thug). This would then "replay the tape" and attempt to download the suspicious file, so that it can now be sent to a sandbox such as Cuckoo,VirusTotal or any other tool for analysis.

The Framework should also include a Web GUI to display the analyzed events and the output of the analysis process, along with some dashboards. To help integration with SIEMs and other systems, the Framework should also provide a remote logging mechanism (e.g. syslog) and some REST-like APIs.

Name: Project 8 - Beeswarm
Mentor: Johnny Vestegaard (DK)
Backup mentor: Aniket Panse (IN)
Skills required: Python (good), HTML/CSS/Javascript (average)
Project type: Improve existing tool
Project goal:

  • Algorithm for automatic distribution and configuration of Beeswarm honeypots and clients.
  • Feed the system with realistic email’s and develop algorithm to distribute the mails between clients and honeypots in a way that looks legit. (Possibly integrate with GSoC 2013 Shiva project)
  • Create a profile engine for honeypot/honeyclient creation (Windows, linux, etc).
  • Implement the authentication part of the RDP protocol (client and honeypot side).

Description: Beeswarm is an active intrusion detection system (IDS) with a focus on ease-of-use. After development during GSoC 2013, the system currently consists of three parts: A managment interface, Honeypots and Clients. The active part of the system is the Clients that generates semi-realistic bait traffic on the network designed to tempt the attacker to dump credentials and reuse them on the Honeypots.

This year during GSoC we would like to develop a algorithm that automatically generates configurations and deployment plans for Beeswarm honeypots/clients. Another thing that is currently missing from the system is the emails that are supposed to be transmitted between Beeswarm clients and honeypots. It could be interesting to extract spam mails from one of our mail honeypots (such as GSoC 2013's Shiva spampot) or develop a algorithm that embedded bait (honeytokens) in the generated mails.

We already have pretty good coverage of common network protocols (ssh, vnc, smtp, pop3, pop3s, http, https) but we would also like to have support for the RDP protocol. This does not needs to be a complete implementation but just the authentication part and then a dummy RDP traffic generator to make the interaction traffic look semi-legitimate.

References:
https://github.com/honeynet/beeswarm/
http://gsoc2013.honeynet.org/2013/10/01/wrapping-up-beeswarm/

Name: Project 9 - Malcom - Malware Communication Analyzer
Mentor: Thomas Chopitea
Backup mentor: Hugo Gascon
Skills required: Python, MongoDB, Web tech (HTML, JavaScript), D3js, TCP/IP
Project type: Improve existing tool
Project goal: Improve Malcom, suggest and add new network analysis features
Description:
Malcom (https://github.com/tomchop/malcom) is a tool that leverages network forensics analysis and threat intelligence to identify and counter malware-related threats. It’s objectives are twofold:

1) collect network artifacts from active sniffing sessions when running malware in a sandbox

2) collect and analyze data from public and private data feeds, to be able to correlate observed network artifacts with a large databank (individual or collective) of “known-bad” artifacts.

The goal of this project is to extend Malcom’s capacities to break encryption keys used within malware, by leveraging known malicious campaign encryption keys and trying them out on network communications (major feature). Another aspect of the project would be to work on existing Malcom features to increase stability and performance (minor features).

Name: Project 10 - String deobfuscator for Android
Mentor: TBC (TBC)
Backup mentor: TBC (TBC)
Skills required: Android RE, python, java
Project type: Improve existing tool | New tool
Project goal: Extract the strings that are obfuscated or encrypted from APKs
Description: This could be and extension for Androguard, or a new tool. There are some cases where you would easily extract the obfuscate strings inside the dex code on the APK, because they are pointing to dynamic code, creating urls on the fly or creating other instructions.
The approach could be dynamic tainting analysis, or source code analysis. For the second one, it could be less expensive.
Extracting this strings will help to improve the analysis of the android malware.

Name: Project 11 - Cuckoo Sandbox
Mentors: Mark Schloesser, Jurriaan Bremer, Claudio Guarnieri
Skills required: Python, C, OS X/Linux internals
Project type: Improve existing tool
Project goal: Extend Cuckoo sandbox to support Mac OSX and/or Linux
Description:
Since the beginning, we designed Cuckoo Sandbox with the intent at some point to be able to support multiple platforms. Since it started in GSoC 2010, Cuckoo has now grown to be a mature project with thousands of users and an active development community which is bringing remarkable improvements to the sandbox. Our Windows analyzer is improving fast and it will even more in the upcoming months.

The goal of this project is start experimenting with preferably Mac OS X or alternatively Linux, as threats for such platforms are on the rise. The student will have to research into the most suitable process tracking techniques for the chosen operating system, implement a functional analyzer and integrate it in the overall execution flow of Cuckoo Sandbox. We explored this idea last year but unfortunately did not find a student able to take on the challenge. Hopefully this year we will.

Name: Project 12 - Thug: Phishing sites identification
Mentor: Angelo Dell'Aera (IT)
Backup mentor: Andrea De Pasquale (IT)
Skills required: Python, HTML/JavaScript
Project type: Improve existing tool
Project goal: Build a new feature in Thug in order to allow phishing sites identification.
Description: The project aim is to extend Thug in order to fingerprint phishing pages. A lot of times URLs fed into Thug lead to phishing sites and not to drive-by download exploit pages.
The idea beyond this project is building some heuristics (which could include looking for form submissions, how many domains are used, misspelled words as well as URL blacklist checks like PhishTank or others) and integrating them into Thug.