Project Slot 14 - Improving SHIVA Spampot

Student: Rahul Binjve (IN)
Primary mentor: Sumit Sharma (IN)
Backup mentor: Muslim Koser (IN)

Google Melange: https://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/rahulbinjve/1

About SHIVA
SHIVA: Spam Honeypot with Intelligent Virtual Analyzer, is an open but controlled relay Spam Honeypot (SpamPot), built on top of Lamson Python framework, with capability of collecting and analysing all spams thrown at it. Analysis of data captured can be used to get information of phishing attacks, scamming campaigns, malware campaigns, spam botnets, etc. SHIVA is written in Python and currently uses MySQL as its back-end.

Project Overview:
The aim of this project is to improve SHIVA on various fronts (better spam distinction, hpfeeds/hpfriends integration, better database implementation, documentation, and if time permits then a web-based UI), and moreover making it easy to deploy and configure. Some detailed improvements:

  • User-friendly installation.
  • Global configuration options.
  • SHIVA, as of now uses MD5 to differentiate between new and old spam. Improvement aims to implement a different detection algorithm (Fuzzy Hashing, probably) to discard processing and storing redundant spams.
  • Currently, MySQL DB is being used in backend. If a NoSQL DB suits project requirement and need is felt, backend would be switched to NoSQL world. Or if MySQL is kept as it is, current DB structure might be optimized and restructured. Aim is to improve performance, even on a very large DB.
  • As feature rich UI generally makes everything look better and provides easy access to information, so, if time permits, a rich featured web-based UI would be developed for configuration controls, IP blocking options, searching and viewing feature.
  • Integration into and sharing collected data using various HPfeeds/ HPfriends channels.
  • Detailed documentation.

Project Plan:

  • Mid Term
    • Integration of fuzzing hashing algo into code - for better distinction of spams to avoid processing and storage of identical spams.
    • Benchmarking spam analysis speed, w/ and w/o fuzzy hashing. If found to be degrading speed, then try Fuzzy String Comparision algoritms.
    • Redesigning the DB schema if continuing with MySQL,for better data handling and reducing data redundancy. Else, NoSQL implementation.
    • HPfeed integration - Creation of various channels to share information:
      • attachements
      • url's
      • raw spam
      • spamming IPs, etc
  • Final Term
    • Start working on installation scripts, automating as much as possible. Hence, making SHIVA easy to deploy.
    • If time permits, start designing web-based UI for SHIVA. The UI should have options to view spams, configure various options of SHIVA, block IPs using iptables, searching for specific spam, etc.
    • Documentation
      • Installation and configuration
      • Project design (overview)
      • Project code flow-chart (overview)
      • Project code flow - detailed
      • Precautions while running shiva (or any honeypot, for that matter)
        • System security concerns
        • bandwidth issues
        • blacklisted by ISPs (relay's IP)
    • Refactoring code (removing scaffolds, adding more comments)
    • Public release of SHIVA.

    Project Activities:
    Week 1 development tasks

  • June 24th - July 1st:

    Project Activities:
    Week 2 development tasks

    Project Source Code Repository:
    shiva-spampot/shiva

    Student Weekly Blog: Project 14 – Improving SHIVA Spampot | Honeynet Project GSoC 2013 Status Updates

    Project Useful Links:
    Papers:
    [1] White, D. R., & Borgatti, S. P. (1994 October) Betweenness centrality measures for directed graphs. 16 (4), 335-346.
    [2] Zemljič, B., & Hlebec, V. (2005 January) Reliability of measures of centrality and prominence. 27 (1), 73-88.