Google Summer of Code 2024 Project Ideas
29 Jan 2024
Getting Started
This page contains a list of potential project ideas that we are keen to develop during GSoC 2024. If you would like to apply as a GSoC student, please follow these two steps to get started:
- Read through this page and identify the project ideas you find interesting. Play around with our tools!
- Join us on Discord and talk to your potential mentors on Discord
If there are any questions, please don’t hesitate and get in touch! 🙂
GSoC and The Honeynet Project
During the previous years of GSoC, the Honeynet Project’s students have created a wide range of very successful open source security projects, many of which have gone on to become the industry standard open source tools in their respective fields.
We are also always interested in hearing any ideas for additional relevant computer security and honeynet-related R&D projects (although remember that to qualify for receiving GSoC funding from Google your project deliverables need to fit in to GSoC’s project timescales!). If you have a suitable and interesting project, we will always try and find the right resources to mentor it and support you.
Please note - even if you aren’t an eligible GSoC participant, we are also always looking for general volunteers who are enthusiastic and interested in getting involved in honeynet R&D.
Each sponsored GSoC 2024 project will have one or more mentors available to provide a guaranteed contact point to students, plus one or more technical advisors to help applicants with the technical direction and delivery of the project (often the original author of a tool or its current maintainer, and usually someone recognised as an international expert in their particular field). Our Google Summer of Code organisational administrators will also be available to all sponsored GSoC students for general advice and logistical support. We’ll also provide hosting for project infrastructure, if required.
For all questions about the Honeynet Project, the GSoC program or our projects, please contact us on Discord (preferred)** or email us at [email protected].
Application template
If you are considering applying to participate with us in GSoC 2024 please find our application template here. Use it when you are preparing your application on the official GSoC site and don’t hesitate to ask your mentors for feedback before submitting!
GSoC 2024 Project Ideas Overview
- #1 - ML-based Web-attack Classification for TANNER
- #2 - Extending the Artemis scanner
- #3 - Extending the DRAKVUF Sandbox analytic pipeline
- #4 - Improving the functionality of Honeyscanner: a honeypot vulnerability analyzer
- #5 - Improving the RioTPot hybrid interaction honeypot
- #6 - DRAKVUF Rust & Python bindings
- #7 - Hack on Mitmproxy!
- #8 - IntelChat: Enhancing Threat Analysis with an LLM-Based Chatbot in IntelOwl
- #9 - New Analyzers for IntelOwl
- #10 - New Documentation Site for IntelOwl and friends
- #11 - Scanners: a new plugin type for IntelOwl
#1 - ML-based Web-attack Classification for TANNER
Mentor: Evgeniia TokarchukProject type: Improving an existing tool
URL: https://github.com/mushorg/tanner
Expected Project hours: 175 or 350 hours
The project aims to enhance the efficiency and accuracy of web attack detection in TANNER by replacing detection based on regular expressions with machine learning methods. The project will be divided into two main parts:
- Research of existing solutions and/or data collection
- Integration of the ML classifier into TANNER
Over the past few years, we have collected data from various SNARE sensors. This data is annotated using regular expressions and can be used for building a data-driven classification model of web-based attacks. However, since this data is noisy and imbalanced, it requires careful pre-processing and filtering. Moreover, curating the test set is essential to build a robust, high-quality model. External datasets can be used along with historical data from TANNER to enlarge the dataset and mitigate the noise. The resulting ML model must have accuracy above the regexp baseline and low latency to enable real-time analysis of the TANNER events.
Requirements: python3, machine learning
#2 - Extending the Artemis scanner
Mentor: Krzysztof ZającProject type: Improving an existing tool
URL: https://github.com/CERT-Polska/Artemis
Expected Project hours: 175 or 350 hours
Artemis is a modular vulnerability scanner that checks various aspects of website security and builds easy-to-read messages to send to organizations to get the vulnerabilities fixed. Multiple national-level CSIRTs use it to improve the security of their constituencies.
The goal of this project is to:
- research what existing tools to add to the Artemis scanning pipeline,
- extend Artemis with modules detecting different types of vulnerabilities,
- improve Artemis in other aspects: performance, UI, etc.
The primary required skill is Python programming and a familiarity with the Linux environment. Skill with web security topics is also desired.
#3 - Extending the DRAKVUF Sandbox analytic pipeline
Mentor: Jarosław JedynakProject type: Improving an existing tool
URL: https://github.com/CERT-Polska/drakvuf-sandbox/
Expected Project hours: 175 or 350 hours
DRAKVUF Sandbox is an open source automated black-box malware analysis system using virtual machine introspection (VMI) with DRAKVUF (https://drakvuf.com) engine under the hood.
As DRAKVUF Sandbox monitors behavior of malware samples it collects a lot of detailed data, like API calls, syscalls, network traffic, etc., however despite this vast amount of information, most of it is not exposed directly to the first-line operators and analysts using the sandbox.
The goal of this project is to:
- extend DRAKVUF Sandbox with useful heuristics for detecting the most common malware types and behaviours
- detect typical malicious patterns like code injection, and create a behaviour graph that can be easily grokked by the analysts (currently a subset of this feature is provided by a thirdparty project proc2dot)
- improve the integration of DRAKVUF Sandbox with the rest of the analytic pipeline. This will make it possible to display the analysis results more directly in other tools
Primary required skill is Python programming and a familiarity with Linux environment. Knowledge of how OS works under the hood and other low-level topics is also very desired. Skill with malware analysis or IT security topics is nice to have, but absolutely not necessary - we will help with any malware-specific design issues.
#4 - Improving the functionality of Honeyscanner: a honeypot vulnerability analyzer
Mentor: Emmanouil Vasilomanolakis, Ricardo Yaben and Shreyas SrinivasaProject type: Improving an existing tool
URL: https://github.com/honeynet/honeyscanner
Expected Project hours: 90, 175 or 350 hours
Honeyscanner is a vulnerability analyzer for honeypots designed to automatically attack a given honeypot, in order to determine if the honeypot is vulnerable to specific types of cyber attacks.
Honeyscanner uses a variety of attacks, ranging from exploiting vulnerable software libraries to DoS, and fuzzing attacks. The analyzer then provides an evaluation report to the honeypot administrator, offering advice on how to enhance the security of the honeypot. Targeted toward security enthusiasts, open-source communities, and companies, Honeyscanner provides a much needed safety check for various honeypots.
This project aims at improving the code base, add new attacks to the Honeyscanner arsenal, as well as add support for more honeypots.
#5 - Improving the RioTPot hybrid interaction honeypot
Mentor: Emmanouil Vasilomanolakis, Ricardo Yaben and Shreyas SrinivasaProject type: Improving an existing tool
URL: https://github.com/honeynet/riotpot
Expected Project hours: 90, 175 or 350 hours
RIoTPot is a hybrid interaction honeypot, primarily focused on the emulation IoT and OT protocols, although, it is also capable of emulating other services. In essence, RIoTPot acts as a proxy service for other honeypots included in the system. Therefore, you can run any honeypot and other services alongside RIoTPot. In addition, there is an UI web-application that you can use to manage your routing. Moreover, RIoTPot comes with multiple low-interaction services ready to use. Since these services are written as plugins, they are only supported on Linux; however, you can start RIoTPot without them. The following table contains the list of services included in RIoTPot by default, their internal port, and proxy port.
The project aims at providing RioTPot with the ability to run in a light mode that requires minimum user interaction and minimize existing external library utilization. Furthermore, we will improve the support for existing profiles and protocols.
#6 - DRAKVUF Rust & Python bindings
Mentor: Tamas LengyelProject type: Improving an existing tool
URL: https://github.com/tklengyel/drakvuf
Expected Project hours: 90, 175 or 350 hours
DRAKVUF is a hypervisor-based malware analysis system written in mostly C & C++. It is designed to be high performant and stealthy, so malware won’t be able to detect the analysis tools.
This project will focus on creating automatic Rust & Python binding generators for the core DRAKVUF libraries (libdrakvuf & libinjector). The goal is to automate the binding generation process, so future changes to the core library APIs will get automatically adjusted in the respective language bindings. Test-cases will need to be created and added to the CI to ensure the bindings remain operational.
The ideal candidate for this project should be at least on an intermediate level in either C, C++, Python or Rust, and will be willing to learn the others on the go.
#7 - Hack on Mitmproxy!
Mentor: Maximilian HilsProject type: Improving an existing tool
URL: https://mitmproxy.org
Expected Project hours: 90, 175 or 350 hours
mitmproxy is your swiss-army knife for debugging, testing, privacy measurements, and penetration testing. It can be used to intercept, inspect, modify and replay web traffic such as HTTP/1, HTTP/2, HTTP/3, WebSockets, DNS, UDP, or any other SSL/TLS-protected protocols. You can prettify and decode a variety of message types ranging from HTML to Protobuf, intercept specific messages on-the-fly, modify them before they reach their destination, and replay them to a client or server later on.
mitmproxy is a large project with a huge number of interesting areas to explore, down from low-level protocol work up to UX improvements. If you are motivated and know what you’re interested in, why not get in touch with us and map out a custom GSoC project? Below are some ideas – an enterprising student should be able to complete one large or 3 or more small tasks in a large size GSoC project.
Potential Tasks: https://github.com/mitmproxy/mitmproxy/issues/6589
#8 - IntelChat: Enhancing Threat Analysis with an LLM-Based Chatbot in IntelOwl
Mentor: Hugo Gascón, Matteo LodiProject type: Improving an existing tool
URL: https://github.com/intelowlproject
Expected Project hours: 350
-
The proposed Google Summer of Code project aims to integrate a cutting-edge, self-deployed LLM-based chatbot into IntelOwl, enhancing user interaction with collected threat intelligence.
-
Leveraging Python libraries like LangChain and ChainLit, the project envisions building an intuitive interface that empowers analysts to pose natural language queries about threat data, fostering a more user-friendly and efficient investigative process (e.g. “In what campaigns have you seen this IOC?”)
-
The chatbot’s capabilities will extend beyond basic queries, seamlessly interfacing with IntelOwl’s enrichment modules when deeper investigation is required, providing a comprehensive and interactive experience for analysts.
-
By harnessing the power of LLM technology, the chatbot will not only streamline communication between analysts and the IntelOwl platform but also adapt to evolving user needs, contributing to a more dynamic and responsive threat intelligence environment.
-
This project aligns with the overarching goal of making threat analysis more accessible and efficient, offering analysts a powerful tool that combines the strengths of natural language understanding, self-deployment, and seamless integration with IntelOwl’s existing modules.
#9 - New Analyzers for IntelOwl
Mentor: Matteo Lodi, Daniele Rosetti, Simone BerniProject type: Improving an existing tool
URL: https://github.com/intelowlproject
Expected Project hours: 175
Right now we have a lot of Analyzers implemented in IntelOwl.
But they are not enough! They are the core part of the application so we want to add even more of them!!!! :)
This project aims to increment the number of available Analyzers. We have about 50 different Analyzers that has been requested by the community members in Github and are still not implemented. We obviously do not ask to implement all of them but a reasonable amount of them based on the available time and the efforts required for each of them.
Adding a new Analyzer to the framework is one of the easiest things that can be done in this project. Once you get used to it, adding more of them is even easier!
The ideal candidate for this project is someone who understand how IntelOwl’s framework works and already tried to implement an Analyzer.
#10 - New Documentation Site for IntelOwl and friends
Mentor: Matteo Lodi, Daniele RosettiProject type: Improving an existing tool
URL: https://github.com/intelowlproject
Expected Project hours: 175
Right now we are not satisfied of how we manage our documentation and how we make it available.
The project aims to create a new repository dedicated to the documentation, move all the documentation of all our projects there and build a new documentation site by leveraging Github Pages and MkDocs.
More information in this Github Issue
The candidate would have the chance to try some popular tools and to solve a big common problem that a lot of other Open Source projects have. The ideal candidate is proactive in reading documentation of new tools and excited in trying them to solve our problem.
#11 - Scanners: a new plugin type for IntelOwl
Mentor: Matteo Lodi, Daniele Rosetti, Simone BerniProject type: Improving an existing tool
URL: https://github.com/intelowlproject
Expected Project hours: 175
Right now there are many possible types of plugins in IntelOwl.
This project aims to add a new plugin type to the already existing ones in IntelOwl:
- The “Scanner” type would be a subtype of the “Analyzers” ones with special configuration. In that way, IntelOwl could be used not only for classic data enrichment with external services but as either a vulnerability scanner or a scraper too. Refer to the Github Issue for more details
Like we have similarly done with other GSoC projects in the past that added new plugin types, we expect the contributor to add the most important new scanners (like this) to IntelOwl once he finishes building the framework to provide a base of tools which can be used by the users.
The candidate would have the chance to work through all the application stack (backend and frontend). The ideal candidate for this project is someone who is familiar with how IntelOwl works and its core concepts.