Google Summer Of Code 2012 Project Ideas

Great news: The Honeynet Project was accepted as a mentoring org for GSoC 2012. Thanks Google! :-)

This page contains a list of potential project ideas that we are keen to develop during GSoC 2012 (we also have additional project ideas currently undergoing internal review, which will be added here too once project deliverables and available mentors have been confirmed). You can also find our previous GSoC 2009 project ideas here, our previous GSoC 2010 project ideas here and previous GSoC 2011 project ideas here too, if you are looking for inspiration or would like to work on one of our existing tools, rather than working on something new.

We are always also interested in hearing any ideas for additional relevant honeynet-related R&D projects (although remember that to qualify for receiving GSoC funding from Google your project deliverables need to fit in to GSoC's 3-month project timescales!). If you have a suitable and interesting project, we'll always try and find the right resources to mentor it and support you. Please note - even if you aren't an eligible GSoC student, we are also always looking for general volunteers who are enthusiastic and interested in getting involved in honeynet R&D.

Each sponsored GSoC 2012 project will have one or more mentors available to provide a guaranteed contact point to students, plus one or more technical advisors to help applicants with the technical direction and delivery of the project (often the original author of a tool or its current maintainer, and usually someone recognised as an international expert in their particular field). Our Google Summer of Code organisational administrators will also be available to all sponsored GSoC students for general advice and logistical support. We'll also provide supporting hosted svn/trac/git/redmine/mailman/IRC/etc project infrastructure, if required.

For all questions about the Honeynet Project, the GSoC program or our projects, please contact us on #gsoc2012-honeynet on irc.freenode.net, subscribe to our public mailing list for people interested in GSoC at https://public.honeynet.org/mailman/listinfo/gsoc or email us directly at project@honeynet.org.

To learn more about the Google Summer of Code event, see the the GSoC 2012 Website.

R&D Focus Areas for 2011-2012

This year our internal honeynet R&D focus is primarily directed into a number of priority areas, which are:

  • Mobile device honeypots
  • Virtualization honeypots / monitoring / attacks
  • Topical malware (e.g. stuxnet SCADA, attacks against mobile platforms such as Android and iPhone, etc)
  • Active defense research (e.g. botnet take down in an ethical manner)
  • IPv6 honeynets
  • Distributed data collection, analysis and visualisation

So unsurprisingly a number of our suggested potential project ideas fall into these research areas.

However, we are also interested in receiving project proposals and tool updates/new tool developments outside these research focus areas too, so hopefully this provides potential students with a wide variety of exciting topics to contributed to and be engaged with this summer.

GSoC 2012 Project Ideas

GSoC 2012 Project Ideas

1. Further extend Capture-HPC with possibility of detecting malicious behavior on Linux Machines
Primary mentor: Adam Kozakiewicz (PL)
Backup mentor: Paweł Jacewicz (PL), Thanh Nguyen (VN)
Type: Extension of existing tool
Skill set required: Python/Java/Groovy, Linux development
Project Goal: Extension of tool to monitor malicious behavior on Linux operating systems.

Project Description:
Capture-HPC is a high-interaction client honeypot developed to detect client-side attacks. It consists of two parts: server and client. Server part manages multiple client instances run on virtualized Windows systems.
Recently a basic Capture-HPC client for Linux machines was developed by Mr Maciej Szawłowski as a part of his BSc thesis at Warsaw University of Technology. The main goal of the project is to further extend functionality of this client software and to better integrate it with Linux operating system architecture

As Linux operating systems gain popularity, it is highly probable that soon a new line of threats targeting Linux users will arise. Extending Capture-HPC with functionality proposed below will greatly contribute to the knowledge of attacks against Linux client software, especially the web browsers.

Goals of the implementation:

  • Research possible solutions for better monitoring events in Linux operating systems, which may lead to development of a Linux kernel module supporting Capture-HPC client software or an alternative solution providing similar functionality
  • Propose changes to Capture-HPC server software and protocol specification to better support newly developed Linux client software
  • Improve collection of information about processes interaction with the monitored operating system
  • Implement a method of collecting dumps of Internet traffic
  • Implement a method of collecting deleted, created and modified files.

The main problem is to implement an efficient method of near real-time system events and modified files collection from the monitored Linux system. It can't be done by simple modifications of the existing code as it uses half-realtime system state monitoring technique. Developing a system kernel module, which will support client monitoring software is the natural way to solve this problem. However, if the goal of efficient and thorough collection of data can be achieved by some other means, the results will be just as acceptable.

Additional information:
http://code.google.com/p/capture-client-ubuntu/

2. Expand Cuckoo Sandbox
Primary mentor: Claudio Guarnieri (IT)
Backup mentor: Tiffany Youzhi Bao (CN), Dario Fernandes (IT)
Type: Extension of existing tool
Skill set required: Python, C/C++, Windows internals
Project Goal: Expand Cuckoo's stability, capabilities and support for
multiple platforms

Project Description:
Cuckoo Sandbox provides a complete, extensible and customizable
solution for performing analysis of malicious artifacts of various
nature. The goal for this project would be extending its current
capabilities.
According to the selected student's skills, the work to be done could
be relevant to:

  • the backend core.
  • the analysis core.
  • integration/consumption of the results.

That could include:

  • Introduce support for analysis of malwares under other operating
    systems (GNU/Linux, Mac OSX, Android).
  • Improve and extend Cuckoo's capability to analyze exploits of
    various nature.
  • Emulate user interaction inside the analysis environment.
  • Improve Cuckoo's hooking and monitoring libraries and port them to
    64bit architectures.
  • Make Cuckoo able to run on native machines.
  • Improve results processing, especially dissection and analysis of
    network dumps.
  • Create data visualization models out of the results generated by
    Cuckoo Sandbox.

Additional information:
http://cuckoobox.org
http://github.com/cuckoobox
http://blog.cuckoobox.org
http://malwr.com

3. Improve our Android application sandbox (DroidBox)
Primary mentor: Patrik Lantz (SE)
Backup mentor: Felix Leder (DE) and Anthony Desnos (FR)
Type: Improve existing tool
Skill set required:Android framework, C, Dalvik opcodes, Assembling/Disassembling, Python, Web development
Project Goal: Improve our Android application sandbox (DroidBox)

Project Description:

  • Extend the sample execution monitoring performed in the Android framework for more detailed API trace and providing more characteristics of a sample. This characteristics can be included in the visualization generated by DroidBox.
  • Port the project to support newer Android versions that utilize JIT compiler in Dalvik. Current version is targeting Android 2.1. The foundation of DroidBox is TaintDroid which has been ported to Android 2.3 by the research team behind it, but also last year by a student as part of his thesis [1].
  • Implement automated analysis of APKs without the need for user interaction with the emulator, for example starting an analysis in a clean emulator state. Additionally, provide deployment of DroidBox in the cloud so that via a web interface for example, samples can be submitted for analysis.
  • Discovering emulator evasion techniques and preventing them. This includes finding any hardcoded values in the emulator environment or behavior differences when running apps in a real device and emulator.
  • Extend DroidBox in a way that it would be able to identify Android botnet server address. We know the server's url usually can not be found directly in the disassembled codes, but sometimes it may be encrypted by DES/AES or be obfuscated by some methods. We can use the DroidBox box to trace the network connection api. Create logic to identify potential botnet servers.

Maintaining and porting DroidBox is rather qumbersome. So another possible direction for DroidBox is to skip the dependency of TaintDroid and modified API framework to instead try to patch an APK before analyzing it. This patching would inject necessary code, producing an instrumented APK. With the injected code we would call a proxy class before actual API calls we are interested in monitoring. Some of the API calls might also be used for emulator evasion techniques and these calls would have to be replaced with our own, returning values as a real device would.

Students can also propose their own ideas around this project.

Additional information:
DroidBox
TaintDroid
TaintDroid OSDI '10 paper
Dalvik opcodes

[1] https://sites.google.com/site/taintdroid23/

4. Improve Androguard tool
Primary mentor: Anthony Desnos (FR)
Backup mentor: Cong Zheng (CN)
Type: Extend existing tool
Skill set required: Python, Android, Java, Qt
Project Goal: Improve Androguard tool http://code.google.com/p/androguard/

Project Description:
Androguard has a set of tools which can be use to detect plagiarisms. To detect something, androguard needs to use specific database which has been filled by someone. It can be very interesting to have a list of android adwares and to build a database of each adware in order to detect them out of the box with androguard.

There exists a set of tools which can be use to perform diffing between two android applications. But for now, we display the results in the console, and it can be difficult to read for everybody, so the idea is to build a generic GUI framework to be usable by any project in order to display differences, and android will be a good example of this framework.

5. Network Analyzer
Primary mentor: Nicolas Collery (SG)
Backup mentor:Adam Pridgen (US)
Type: Develop new tool
Skill set required:
Project Goal:

Project Description:
Open source web based and command line network analyser inspired by netwitness, chaosreader, tcpflow, honeysnap, pcapr, splunk... The purpose is to display in a nice manner know traffic (incl. known protocol but non std port), identify unknown traffic and allow plugins to interact with other systems (like cuckoo). For example if the traffic recognise some malware familly (yara for pcap) then pass some info to cuckoo (name=spyeye) then wait for a feedback (decryption key=xxxx) to decode traffic. The information can be passed to other system for a more complete analysis.

6. IPv6 attack detector
Primary mentor: Hugo Gonzalez (MX)
Backup mentor:
Type: New tool
Skill set required: protocols, python programming
Project Goal: Write a software to detect specific IPv6 attacks (specially thc-ipv6 attacks)

Project Description:
There are specific attacks on IPv6 networks, and specially on IPv4 networks with IPv6 enable hosts. Basic attack is over router announcement for example, this could generate from DoS to MitM, so the proposal is to write a software to detect this specific attacks to IPv6 hosts.
First, I'm thinking something similar to arpwatch, but for IPv6 router announcements. Detect uncommon neighbor requests.
This should be the "shield" for the thc-ipv6 attacks.

7. Network malware simulator
Primary mentor: Hugo Gonzalez (MX)
Backup mentor:
Type: New tool
Skill set required: protocols, network programming, python programming
Project Goal: Write a software to simulate malware network behaivor.

Project Description:
There are some papers about simulating malware network behavior, but so far there are no tools available. This tool should emulate/simulate the behaivor of different malwares in the network, like worms, bots, downloaders, etc. It should have client/server architecture. This tools will help to evaluate security systems or network detectors. This should be modular to include new behaviors.

Related information:
http://gull.sourceforge.net/
http://code.google.com/p/ostinato/

8. Improving APKInspektor
Primary mentor: Cong Zheng (CN)
Backup mentor: Anthony Desnos (FR), Kara Nance (US)
Type: Extend existing tool
Skill set required: Python, QT, Dalvik opcode
Project Goal: Improve APKInspector

Project Description:
1. Improve the installation process. QT’s environment is a little bit complex to install, this should be automated.

2. Improve the capabilities for configuration. Goal is to interact with every instruction. Right now interaction is only possible with blocks. This should be more fine-grained.

3. Improve some graph features. The call in/out should be showed in a call graph.

4. Repackaging ability .You can modify the smali codes and repack them to a new app by Apktool. For example if you want to log the register’s value after you add a log api into the smali codes.

Now, the CFG’s codes are using the dalvik codes in Androguard. We should change that to smali codes. Smali code sometimes is much more powerful and readable.

5. Data flow analysis. Goal is to be able to trace the register’s source in each instruction. Some program analysis technologies are needed, such as backward slicing and simulation execution.

Related information:
APKinspector : http://code.google.com/p/apkinspector/
Androguard: http://code.google.com/p/androguard/
Apktool: http://code.google.com/p/android-apktool/
Dalvik opcodes: http://pallergabor.uw.hu/androidblog/dalvik_opcodes.html
pyQT: http://www.riverbankcomputing.co.uk/static/Docs/PyQt4/html/classes.html

9. Wireshark Forensics Extensions
Primary mentor: Guillaume Arcas (FR)
Backup mentor: Jianwei Zhuge (CN) />
Type: Improve existing plugins and developping a new one
Skill set required:Network programming, C, wireshark/pcap
Project Goal: Adding more network investigation (forensics) plugins to Wireshark

Project Description:
Wireshnork and WireViz are two Wireshark plugins developped during GSoC 2011. Both can be improved: for example, Wireshnork currently works on all packets of a pcap file, it could be easier to be able to run it only on some selected packets or streams. Another improvement to Wireshnork could also be to take advantage of the pcap-ng format. WireViz also can be improved by adding, for example, colorization on links between graph elements to show frequency of an event's occurence.

Beside improving these two plugins, development of a File Extractor could be very usefull. The goal is to extend current Wireshark file extraction features for HTTP (see the actual Export/Objects/HTTP menu) to other commonly used protocols like SMTP, FTP, SMB, etc. This plugin would act like tcpextract, foremost or chaosreader do.

There's also a new plugin that can be very usefull: WireMap. WireMap will do geolocalization of IP and Google-mapping its. The goal is to draw Google Maps with IP of a PCAP file with the help of Maxmind GeoIP databases. If lines can be drawn between IPs, it's even better.

Note that the student can choose to improve Wireshnork and WireViz or developp the File Extractor or WireMap plugin.

Additional information/Usefull links:
Wireshark
GSoC 2011 Wireshark
Plugins

WireShnork Plugin

10. HoneyProxy
Primary mentor: Guillaume Arcas (FR)
Backup mentor: Jamie Riden (UK)
Type: Develop new tool
Skill set required:HTTP/HTTPS, Python
Project Goal: Develop a light HTTP/HTTPS Proxy for Web Traffic Investigation

Project Description:
Web Proxies can be usefull to inspect and investigate HTTP and HTTPS flows. Some OpenSource proxies like BURP or OWASP WebScarab already exist for Pentesting, can also be used to investigate Web Traffic but are quite "heavy", as they do too much unneeded things in a Forensic context.

HoneyProxy will provide the same features as BURP Proxy does :

  1. Run as a traditional Web proxy or as transparent proxy
  2. Capture and log HTTP objects
  3. Allow requests & answers interception and manipulation
  4. Capture HTTPS traffic (doing man-in-the-middle decryption)
  5. Log requests, answers
  6. Provide HTML pages rendering (i.e.: show intercepted HTML and other web objects in a browser)

For HTTP objects catpure, requests and answers may be logged "as is" in plain text or in their native format for non text files (fash, jar, exe, etc), HTTP headers must also be logged (in text).
Logging traffic in PCAP file should also be considered.

Programming language is free of choice even if Python can be a good choice to add portability to HoneyProxy.

Additional information/Usefull links:
BURP Proxy
WebScarab
Decoding malware SSL using Burp proxy

11. Automated Attack Community Graph Construction
Primary mentor: Julia Cheng (TW)
Backup mentor: Claudio Guarnieri (IT), Thanh Nguyen (VN)
Type:Data analysis and integrated onto exising Splunk system
Skill set required: Python programming, Shell scripts, Splunk(optional), Honeypot technology, graph theory
Project Goal:Constructing attack community graph from multi-sources honeypot logs to recognize the attack approaches and relationships via social network analysis

Project Description:
Large amounts of honeypot logs result in difficulties in data analysis and interpretation. This GSoC idea is to construct attack community graph from multi-sources to recognize attack approaches inside large-scales attack community.

Goal of implementation:

  • Automatically extract features from multi-sources honeypot logs to build attack community graph via relationships and parameters
  • Identify attack approaches and scale via social network analysis
  • High-level functional describe attack behavior and its intention
  • Develop Apps integrated onto splunk system.

12. Malicious URL pre-filtering
Primary mentor: Julia Cheng (TW)
Backup mentor:
Type: Develop new tool
Skill set required: Python programming
Project Goal: URL pre-filtering

Project Description:
Pre-filtering malicious URL for improving the performance of honeyclient.

13. Printer Honeypot
Primary mentor: Franck Guenichot (FR)
Backup mentor: Sebastien Tricaud (FR)
Type: New tool
Skill set required:
Project Goal: Extend dionaea with a new module or write a new tool.

Project Description:
Given the recent findings in the printer exploitation field, the goals of this project is to develop a dionaea module
which is able to catch potentially malicious documents sent to tcp port 9100.
The module will have to :
emulate (basically) a printer to catch the documents sent
answer some specific PJL commands like @PJL INFO ID with real values to avoid detection.
Detect potentially harmful PJL commands (UPGRADE, FSUPLOAD, FSDOWNLOAD, etc...) that can be embedded in the document sent
Basically parse the document to detect its personality (PS / PCL)
Use the incidents handling functions of dionaea to report all the incidents in log files (ip source, detailed information on the document sent)
Stores the potentially malicious documents (in a file) on the honeypots for further analysis
Capture the network conversation (bistreams) between the “attacker” and the honeypot.
This module or new tool must be able to use HPfeeds to log all the incidents or to send the captured documents to specific communication channels.

Literature:
Andrei Costin’s research papers
Ang Cui’s research papers
PrintFS
HP Printer Job Language (PJL) Technical Reference
Adobe PostScript (PS) reference manual

14. Glastopf improvements
Primary mentor: Jamie Riden (UK)
Backup mentor: Lukas Rist (DE)
Type: Extend existing tool
Skill set required: Python and a bit of PHP
Project Goal: Extend Glastopf with new features

Project Description:
Machine learning based request classification improvements. So far classification is based on patterns. Goal is to improve the patterns using collected requests and machine learning to extract the unique features.
Source IP profiling will give us a deeper understanding about an IP’s history and shows us eventually if two IP addresses belong to the same botnet based on their request patterns.
POST handling will allow us to properly reply to attacks using the POST method. We will store and analyze the payload of the request and try to reply according to the adversaries expectations.
The internal sandbox needs a lot of tweaking to make it more secure and more accurate but also less “fingerprintable”.

Suggestions from students are very welcome and a high priority selection criteria.

Additional information:
Glastopf project page
Project repository

15. KVM management
Primary mentor: Tillmann Werner (DE)
Backup mentor: Claudio Guarnieri (IT)
Type: New Tools (HP) or Extension of existing tool (non-HP)
Skill set required: C/C++, Python
Project Goal: The goal is to have a control and data collection library for KVM

Project Description:
Management tools for VMWare and Vitualbox have really nice management tools that enable the collection of information that are interesting for forensics and malware analysis.
KVM is, for multiple reasons, a very interesting platform for these
types of analysis. The ease of use for analysts and tools for collecting
important information are lacking a bit.
This project is about extending existing libraries or writing a new
library that enables features like easily managing snapshots, collecting pcaps from the VM network device, create memory dumps, ...

16. Python in the Kernel
Primary mentor: Felix Leder (DE)
Backup mentor:
Type: New Tool
Skill set required: C, Windows kernel development, Python low-level
Project Goal: Port Python to the Windows kernel for forensics tools

Project Description:
Python has proven to be a very powerful language for forensics (e.g.
[1]). This is based on the rapid prototyping and data parsing features
of this language.
PyBox [2] is a project that already allows to create monitoring and
forensics utilities for the user-space programs. It takes only a few
lines to write powerful monitoring applications.
It would be awesome to be able to quickly and easily write new forensics and analysis tools for the Windows kernel with scripting. In order to do so, it is necessary to port Python to the Windows kernel and to create an interface for loading scripts. This will likely require to strip down parts of the Python core.

[1] https://www.volatilesystems.com/default/volatility
[2] http://code.google.com/p/pyboxed/

17. AfterGlow Cloud
Primary mentor: Raffael Marty (US)
Backup mentor: Ralph Logan (US)
Type: Building user interface and cloud-enabling an existing tool
Skill set required: Python / Perl / HTML5 / (D3.js) /
Project Goal: AfterGlow is an existing open source visualization tool. This project is about cloud-enabling the tool. Currently the tool is command-line focused and it needs a Web-based interface to run it as a service.

Project Description:
The project will require the development of a Web service. The Web service will allow people to upload CSV files and visualize them as a network graph. AfterGlow is an open source visualization tool that already exists. It allows users to convert CSV files into representations of a network graph (DOT or GDF files). The tool is completely command line based right now and needs a user interface for configuration and rendering:
We need two separate components to be developed:

  1. Developing a configuration interface to
    upload data (CSV)
    - (optional: connect to existing Web service via oAuth, REST to get the data)
    - configure the tool output (color, node sizes, graph layout algorithm, etc)
  2. Rendering visual output based on DOT or GDF data format.
    - In a first version, the output can be generated with the help of GraphViz
    - (Optionally, the output can be rendered on the fly through D3.js or similar.)

The input from step one is fed into the existing AfterGlow tool to compute the DOT or GDF files that can then be rendered.

18. SSL Trust
Primary mentor: Jeff Nathan (US)
Backup mentor: TBC
Type: New tool
Skill set required: Network programming, browser plugins
Project Goal:Improving web brower SSL trust

Project Description:
Develop a system to enumerate SSL certificate data such as fingerprints and dates valid from sites on the Internet and store the results so they can be queried later.

Then, develop a system that allows this data to be queried securely and without SSL by users on remote networks.

Finally develop a browser plugin to securely validate the SSL certificate presented by any site against the repository of SSL certificate information.

The purpose of this project is to expose legitimately signed but potentially unscrupulous SSL certificates that allow network operators to see the content of SSL encrypted data. The risk of legitimately signed but potentially unscrupulous SSL certificates exists in two forms:

A) Subordinate root Certificate Authority signing keys

B) Countries where the Government also operates a root Certificate Authority

Appropriate technologies and tools to be chosen by the student.

19. Droidbox Heuristic Detection of Malicious Applications
Primary mentor: Jeff Nathan (US)
Backup mentor: TBC
Type:Extension of existing Droidbox GSoC tool
Skill set required: Android development
Project Goal:Extend droidbox by adding heuristic detection to quickly identify potentially malicious applications

Project Description:
The purpose of this project is to reduce the continually growing set of Android applications that need to be analyzed using using manual and sophisticated mechanisms. This reduction would be aided by adding the following simple classification methods to our Droidbox project from GSoC 2011:

A) Identify applications that download additional application packages when downloaded and treat them as

B) Define sets of application permissions as being potentially malicious and identify applications that utilize these sets of
permissions

C) Define obvious sandbox evasion behaviors and identify applications attempting to evade being executed within a sandbox

Experience in Android development needed, ideally with some exposure to Droidbox.

20. HonEeeBox User Interface
Primary mentor: David Watson (UK)
Backup mentor: Arthur Clune (UK), Ben Reardon (AU), Raphael Marty (US)
Type:Development of new Django/JavaScript UI for HonEeeBox
Skill set required: Web development, Django, Python, Javascript, graphing/charting, PostgresSQL database
Project Goal:Build a new user interface for our HonEeeBox distributed low interaction honeypot sensor network, using data from our HPFeeds system

Project Description:
In GSoC 2011 last year we ran a project to build a new backend and user interface for our HonEeebox distributed low interaction honeypot sensor network. You can find more information about this project at:

https://www.honeynet.org/gsoc2011/ideas#project2

https://www.honeynet.org/gsoc2011/slot2

This project successfully built a new submit_http submissions backend and data enrichment system for our first generation HonEeeBox system, which used Nepenthes as the low interaction honeypot. It was less successful at building a working web based user interface, running out of time before it could be completed.

Since last year, we have upgraded our HonEeeBox system to use the next generation Dionaea low interaction honeypot developed during previous GSoCs (including VoIP honeypot capabilities). We have also switched from the older submit_http data transport system to a new internally developed HPFeeds data transport system that we are currently piloting.

You can find the internal slides about HonEeeBox from our recent March 2012 annual workshop held at Facebook in Palo Alto here.

This year we would like to focus on building a new web based user interface for HonEeeBox - ideally using Django plus Javascript against a PostgreSQL database, although we are open to alternative technology suggestions too. The goals will be initially to display information (both public and private - where public does not reveal the source IP addresses of sensors) about the number of HonEeeBox sensors actively submitting data, attack rates, types and sources of attacks over time, plus AV and sandbox data about the malware samples downloaded. Data will be provided to develop against from an enriched HPFeeds data set, adding IP Geolocation, ASN lookup and AV/Sandbox analysis results. The deliverable will be a working v1 user interface, followed by the addition of extra geolocated and time series based data analysis as project time allows.

Longer term we will be looking to integrate existing and new data visualizations using tools like Processing, Google maps/Google Earth, previous GSoC work, etc so experience or familiarity with that type of web visualization would also be helpful.

Previous GSoC Projects

If this list of potential project ideas doesn't interest you, or you want to work on a previous project or tool, you can find more details at:

* GSoC 2009 Project Ideas
* GSoC 2009 Accepted Projects
* GSoC 2010 Project Ideas
* GSoC 2010 Accepted Projects
* GSoC 2011 Project Ideas
* GSoC 2011 Accepted Projects
* Honeynet Project Tools

Finally, please remember that you are also free to suggest your own project ideas and we'll try our best to find you a suitable mentor for GSoC 2012 too.

Good luck with your student applications! :-)