This page contains a list of potential project ideas that we are keen to develop during GSoC 2018. If you would like to apply as a GSoC student, please follow these three steps to get started:
If there are any questions, please don’t hesistate and get in touch! 🙂
During the previous years of GSoC, the Honeynet Project's students have created a wide range of very successful open source security projects, many of which have gone on to become the industry standard open source tools in their respective fields. Examples for these include:
If you can't find something to immediately interest you, please take a look at GSoC 2009, GSoC 2010, GSoC 2011, GSoC 2012, GSoC 2013, GSoC 2014, GSoC 2015, GSoC 2016 and GSoC 2017 project ideas pages for other inspiration.
We are also always interested in hearing any ideas for additional relevant computer security and honeynet-related R&D projects (although remember that to qualify for receiving GSoC funding from Google your project deliverables need to fit in to GSoC's 3-month project timescales!). If you have a suitable and interesting project, we will always try and find the right resources to mentor it and support you.
Please note - even if you aren't an eligible GSoC student, we are also always looking for general volunteers who are enthusiastic and interested in getting involved in honeynet R&D.
Each sponsored GSoC 2018 project will have one or more mentors available to provide a guaranteed contact point to students, plus one or more technical advisors to help applicants with the technical direction and delivery of the project (often the original author of a tool or its current maintainer, and usually someone recognised as an international expert in their particular field). Our Google Summer of Code organisational administrators will also be available to all sponsored GSoC students for general advice and logistical support. We'll also provide hosting for project infrastructure, if required.
For all questions about the Honeynet Project, the GSoC program or our projects, please contact us on gsoc-slack.honeynet.org (preferred) or email us at [email protected].
New Projects
Mitmproxy - HTTPS interception proxy
Honeytrap - Advanced Honeypot framework
DRAKVUF - Black-Box Binary Analysis System
Holmes Processing - Cyber threat intelligence at scale
Android-Related Projects
Capstone/Unicorn/Keystone Projects
Independent Projects
Hardware virtualization based monitoring, like libvmi and DRAKVUF, is a good approach to provide the out-of-the-box dynamic code analysis and offer a stealthy guest instrumentation. However, the overhead of this type of monitoring is extreme high, mostly over several times. Alternatively, nearly all of the modern ARM and Intel architectures provide trusted execution environment (TEE) to offer isolated storage and execution environment, entitled as “normal world” and “secure world”. The purpose of this project is to constructure a monito(like eBPF in the latest version linux kernel) in the “secure world” which can collect sensitive data from the rich operating system( locating in the “normal world”) and create a stealthy monitor since program in “normal world” cannot access “secure world” directly. In our project we choose the trustzone on ARM (op-tee) as our TEE basement.
[0] http://www.brendangregg.com/perf.html#eBPF
[1] https://github.com/OP-TEE
Mitmproxy is an interactive TLS-capable man-in-the-middle proxy. It can be used to intercept, inspect, modify and replay HTTP, HTTP/2, HTTPS, WebSockets, and raw TCP traffic. Think of it as a mix of WireShark and the Chrome developer tools - you can hook up any device or program and see how it communicates on the network. Mitmproxy is used by software developers, penetration testers, privacy advocates and researchers to fix bugs, find vulnerabilities, uncover privacy violations, conduct empirical research, and more.
Getting started for GSoC: https://github.com/mitmproxy/mitmproxy/issues/2812
Project page: https://mitmproxy.org/
Code repository: https://github.com/mitmproxy/mitmproxy
A major feature on mitmproxy’s roadmap is the replacement of our proxy core with an implementation that separates I/O and protocol logic (“sans I/O”). As you can guess this is a major undertaking, but we’re determined to tackle it for a whole bunch of reasons. There are a couple of places where we would be happy to have help here:
Further Resources:
python3 -m mitmproxy.proxy2.server
)mitmproxy.proxy
on master, mitmproxy.proxy2
on the sans-io branch.See here for details on how to get started.
Mitmproxy is a large project with a huge number of interesting areas to explore. If you are motivated and know what you're interested in, why not get in touch with us and map out a custom GSoC project? Below are some ideas with a rough project size estimations - an enterprising student should be able to complete one large or 3 or more small projects during the GSoC period.
See here for details on how to get started. We encourage you to also think beyond what is listed above - what would *you* do to improve mitmproxy?
Honeytrap is an extensible and opensource system for running, monitoring and managing honeypots.
Getting started for GSoC: https://github.com/honeytrap/honeytrap/issues/
Project page: http://docs.honeytrap.io/docs/home/
Code repository: https://github.com/honeytrap/honeytrap/
Honeytrap is an advanced honeypot framework where listeners, directors, services and events are extensible. Existing honeypots can be used, but new simulated services can be implemented also.
Implement a new director that will allow running Qemu images as target. Each attacking IP will run get its own Qemu process, preserving all information and data. Eventually existing services can be extended to facilitate low- to high interaction, where depending on the first commands we’ll pick another qemu image.
Honeytrap has been built modular, so you’ll only need to work on the director itself, the services and listener already facilitate running connections to other targets.
Currently we support proxying traffic to remote hosts (eg a real host, another honeypot) , experimental firejail containers or individual per-attacker lxc containers. Protocols will have a -proxy variant, with protocol knowledge that will take care of sending events. Honeytrap has an advanced event mechanism, filtering and sending events to Slack, Elasticsearch, Kafka, File and Console.
See here for details on how to get started.
Honeytrap is an advanced honeypot framework where listeners, directors, services and events are extensible. Existing honeypots can be used, but new simulated services can be implemented also.
Implement new services (like an elasticsearch simulator service), fix issues, improve the docs or other great ideas you’ll have.
Honeytrap has been built modular, so you’ll only need to work on the director, listener, service or channel itself.
See here for details on how to get started.
DRAKVUF is a virtualization based agentless black-box binary analysis system. DRAKVUF allows for in-depth execution tracing of arbitrary binaries (including operating systems), all without having to install any special software within the virtual machine used for analysis.
Currently DRAKVUF uses a memory cloaking technique in which shadow memory pages are direct mapped to a dummy zero-filled pages at the end of the guest physical memory. While this technique protects against discovery that simply reads from memory, a knowledgeable attacker can probe these pages by writing to them a canary and observing all other pages where the canary appears. To avoid this information leak, shadow pages where a canary gets written needs to get their own cloaking zero-page (similar to deduplicating a page when performing copy-on-write).
The current process injection mechanism used by DRAKVUF sets up the stack for creating a call to ntdll.dll!CreateProcessA. Thus, the process created by this mechanism will reflect the executable name as found in the disk, which may result in information leak malware could use to detect DRAKVUF. There are process cloaking techniques (usually used by malware) to hide the process in the shell of another: process hollowing and process doppelgänging. Implementing these techniques using VMI will improve the resiliency of DRAKVUF against potential malware trying to detect the environment.
Recent efforts in the development of a foundation for the dynamic malware analysis framework DRAKVUF on the ARM architecture have extended the Xen hypervisor to establish the means for stealthy guest instrumentation. More precisely, one of the implementation efforts of Google Summer of Code in 2016 resulted in the Xen altp2m subsystem for ARM [0,1] that allows to maintain multiple second level address translation tables, each representing a specific view on the guest's physical memory. As to close the circle, the GSoC 2018 student is expected to tackle the problem from the opposite side and extend DRAKVUF itself as to provide ARM support. This project will close the circle in providing the foundation for DRAKVUF on ARM and thus establish a solid ground for dynamic malware analysis for mobile devices.
[0] https://www.holmesprocessing.com/gsoc/#portfolioModal1
[1] https://summerofcode.withgoogle.com/archive/2016/projects/6408159388237824/
Holmes Processing was born out of the need to rapidly process and analyze large volumes data in the computer security community. At its core, Holmes Processing is a catalyst for extracting useful information and generate meaningful intelligence. Furthermore, the robust distributed architecture allows the system to scale while also providing the flexibility needed to evolve.
The Holmes Project has recently acquired a large dataset of labeled malware artifacts, which can be used for deep learning based malware relationship mining. This labeled dataset of over 20k samples should be a big help for students attempting to do Malware Relationship Detection. Besides, as a result of the previous GSoC’17, we also have an efficient data model for the malware relationships. New potential GSoC students can immediately start with the machine learning part without concerns for optimal data modeling and distributed storage. As a follow-up project, students are expected to come up with decent learning model to detect malware relationship and create better visualisation frontend. In order to visualize the relationship properly, the model needs to learn to aggregate relationships from different malware analysis services.
Ticket: https://github.com/HolmesProcessing/gsoc_relationship/issues/22
Improve Holmes-Storage v2 by adding more storage solutions and finishing a general API that can be used via HTML and AMQP.
At the end of the project Holmes-Storage should be an akka based, scalable application that supports data storage via MongoDB, Cassandra, and Elastic as well as object storage via S3, Minio, and the local file system (testing).
The second part of this should be the implantation of a general API that can be used to interact with both selected storage solutions. A user should be able to connect to this API via a simple RESTful HTTP web server based on akka http as well a dedicated AMQP queue with optional callback queue for replies if necessary.
More Details: https://github.com/HolmesProcessing
DroidBot [1] and DroidBox [2] are dynamic analysis tools for Android apps. However, installing and configuring such tools are difficult for analyzers. So we already setup a docker container for using DroidBox [3]. But we want to make these tools easier to use by having a web service as frontend and a small backend that runs analysis in DroidBox and save the results. Some wanted features include:
[1] https://github.com/honeynet/droidbot
[2] https://github.com/pjlantz/droidbox
[3] https://hub.docker.com/r/honeynet/droidbox/
[4] https://www.hybrid-analysis.com/
[5] http://sanddroid.xjtu.edu.cn/
[6] https://github.com/oNaiPs/droidVncServer
[7] https://github.com/honeynet/droidbot/tree/master/script_samples
DroidBot [1] is a dynamic analysis tool that tries to trigger sensitive behaviors of Android apps by sending random test input. Similar to many other test input generation tools, the key challenge of DroidBot is to improve test coverage (i.e. letting the generated test input execute more code, thus trigger more sensitive behaviors). However, the random test strategy only have limited performance. We want to make use of the technical advances in AI to help DroidBot generate more reasonable and meaningful test input. For example, using AI to detect similar UI views in order to avoid redundant input, and using AI to understand the dependency between UI views in order to generate targeted input. We have a lot of training data (DroidBot results for thousands of Android apps) or you can use open dataset like Rico.
[1] https://github.com/honeynet/droidbot
[2] http://interactionmining.org/rico
[3] https://arxiv.org/pdf/1709.00928.pdf
[4] http://drops.dagstuhl.de/opus/volltexte/2016/6695/pdf/dagrep_v006_i004_p161_s16172.pdf
There are great tools like droidbot and cuckoo droid to
perform Android malware analysis. We want to offer a new tool to
perform these analyses on real devices. We need to automatize as much
as possible to reset a real device to a clean state. Restore a backup
of the device, so it looks like a normal device. This should include
instrumentation software as Xposed and/or Frida. Then the process of
installing the malware and run it, and run other apps also. Capture
the network traffic. Finally recover all data from the device and
process it to offer a comprehensive report about the malware. Then
start again with a new sample.
Because the variety of the devices, it is not intended to cover all
devices, but common ones to keep all the process working.
References:
https://github.com/idanr1986/cuckoo-droid
https://github.com/honeynet/droidbot
https://www.frida.re/docs/android/
http://repo.xposed.info/module/de.robv.android.xposed.installer
LibVMI is a C library with Python bindings that makes it easy to monitor the low-level details of a running virtual machine by viewing its memory, trapping on hardware events, and accessing the vCPU registers. The Bareflank Hypervisor is an open source, lightweight hypervisor, that provides the scaffolding needed to rapidly prototype new hypervisors. Adding support to LibVMI to interact with Bareflank will provide the ability for malware researchers to rapidly prototype and test virtualization based ideas against even the most elusive malwares.
Further Resources
The last GSoC for Conpot turned out to be great: We fixed crucial flaws in our BACnet and MODBUS implementation and added ENIP (EtherNet/Industrial Protocol). This time, we’re trying to extend the industrial honeypot with common protocols (FTP, telnet, ..), a virtual (“journaled/versioned”) file system and a new internal interface that let’s protocols interact more deeply with each other.
Wave 2 of Conpot Protocols aims to make the industrial honeypot more versatile and - leveraging new core enhancements like a virtual file system, collects new data such as uploaded malware in order to enable further research on automated attacks against critical infrastructure.
[0] https://github.com/mushorg/conpot
[1] http://conpot.org/
Cuckoo Sandbox Longterm Analysis (Cuckoo LTA) was ported from legacy to the latest version of Cuckoo Sandbox a few months ago.
The aim of longterm support is to be able to monitor specific malware samples/families over a longer period of time. E.g: running a sample for five days from 09:00 to 17:00 to simulate an office environment.
To increase the usefulness of LTA, additional features should be added and challenges need to be solved. During this project, one or more new features will be worked on. Of course, we would also love suggestions on new features using the data collected (network, API calls, etc) by Cuckoo LTA.
During a normal Cuckoo analysis, a large amount of data can be collected in a matter of minutes. Because longterm analyses can run for multiple days, this can result in impractical amounts of data. To solve this, less data should/will be collected longterm analyses. The modules that process and report the collected
data should still be able to use this data to generate realtime and useful reports.
At the time of writing this, Cuckoo LTA is being worked on, meaning it is likely more challenges and features to build will arise.
Tasks for this project would among others things be: designing/writing new (OO) classes, optimizing database schemas, thinking about how specific types of data can best be stored, testing the implementation (unit testing).
More Details: https://cuckoosandbox.org, https://github.com/cuckoosandbox/cuckoo
The number of client-side attacks has grown significantly in the past few years shifting focus on poorly protected vulnerable clients. Just as the most known honeypot technologies enable research into server-side attacks, honeyclients allow the study of client-side attacks.
A complement to honeypots, a honeyclient is a tool designed to mimic the behavior of a user-driven network client application, such as a web browser, and be exploited by an attacker's content.
Thug is a Python low-interaction honeyclient aimed at mimicking the behavior of a web browser in order to detect and emulate malicious contents.
The project aims at
References:
https://github.com/buffer/thug
https://github.com/flier/pyv8
https://github.com/tbodt/v8py
The project contains work both on SNARE [1] and TANNER [2] sides.
Since a lot of work has been done during last two GSoC on TANNER side, in this year we suggest focusing on the SNARE side.
References:
https://github.com/mushorg/snare
https://github.com/mushorg/tanner
A Department of Homeland Security sponsored project at the
University of Washington [1] produced a set of open source Ansible
playbooks and software [2] usable for constructing a small-scale
distributed system composed of multiple Linux virtual machines
(including development of a software system “Tupelo” that re-implements
a previous Honeynet Project tool “Manuka”). This proposed project will
extend a fork known as “D2” [3] that supports deploying systems on
Digital Ocean supporting the following features: Semi-automated
provisioning and deployment of Digital Ocean droplets and DNS records
using Hashicorp “terraform"; Support for SSH host key management
allowing StrictHostKeyChecking to be left enabled, while avoiding manual
host key validation or insertion/deletion: A Trident trust group
management and communication portal behind an NGINX reverse proxy
secured by TLS; A Jenkins build server behind an NGINX reverse proxy
secured by TLS, with Jenkins CLI secured with SSH; Support for
Letsencrypt SSL/TLS certificate generation, backup & restoration,
renewal-hooks for deploying certificates to non-privileged services, and
scheduled certificate renewal maintenance; Support for SPF, DKIM, and
DMARC in Postfix SMTP email; Centralized rsyslog logging secured by TLS;
AMQP (RabbitMQ) message bus for remote procedure call, log distribution,
and simple text chat, all secured by TLS. Potential new features include
supporting other cloud providers (Google Compute Engine, Amazon Web
Services, Azure, etc.), playbooks for installing threat intelligence
tools (Collective Intelligence Framework, MISP), playbooks for
installing open source log monitoring tools (Mozilla Defense Platform),
playbooks for installing host forensic tools (Google Rapid Response,
Tupelo).
References:
[1] https://staff.washington.edu/dittrich/home/dims.html
[2] https://github.com/uw-dims?type=source
[3] https://davedittrich.readthedocs.io/projects/ansible-dims-playbooks/en/latest/
The Infection Monkey is an open source Breach and Attack Simulation (BAS) tool that assesses the resiliency of networks to post-breach attacks and lateral movement. Using the Infection Monkey, organizations can consistently check their network security by simulating a repeatable attacker.
To see more, visit https://www.github.com/guardicore/monkey.
The Infection Monkey relies on a mix of password brute force attacks and some potentially wormable vulnerabilities. We’d like to extend this capability to cover recent logical long lived vulnerabilities such as the Oracle weblogic and Struts2 Java framework vulnerabilities, so the Monkey can test more systems. A key requirement is the stability of the network services, so any attack cannot risk disabling the target service.
Adding exploiters consists of adding a scanning module and exploitation module, to recognise and attack the target.
Tickets:
https://github.com/guardicore/monkey/issues/105
https://github.com/guardicore/monkey/issues/106