Project Slot 3 - Thug Honeyclient Distributed Task Queing

Student: Akshit Agarwal (IN)
Primary mentor: Angelo Dell'Aera (IT)
Backup mentor: Sebastian Poeplau (DE)

Google Melange: Project Page

Project Overview:
Currently Thug works like a stand-alone tool and does not provide any way to distribute URL analysis tasks to different workers. For the same reason it is neither able to analyze difference in attacks to users according to their geolocation (unless it is provided a set of differently geolocated proxies to use obviously). After the implementation of this project we will be able to solve both problems by creating a centralized server which will be connected to all the Thug instances running across the globe and will distribute URLs (potentially according to geolocation analysis requirements). After that the clients will consume the tasks distributed by centralized server and will store the results in database after processing them.

Deliverables:

  • Main Server handling all Thug instances
  • A dedicated Main Server will be created which will be fed with URLs coming from different sources (i.e. spamtraps) and will distribute URLs among different Thug instances using Celery as Distributed Task Queueing Tool and RabbitMQ as Message Broker.

  • Thug instances processing allocated tasks
  • Thug instances (Thug Honeyclient) would be extended with functionalities which will allow them to connect with the Main Server and consume tasks allocated by it. The instances will process such URLs and store the results in database for later analysis.

    Project Plan:

  • May 27 - June 17: Community Bonding Period
    Explore all possible tools for Distributed Task Queuing and found Celery better than Pyro. Decided to use Celery with RabbitMQ instead of Pyro. Started learning RabbitMQ and Celery.
  • June 17: GSoC 2013 coding officially starts
  • June 17 - June 24:
    Implemented simple prototypes using RabbitMQ and Celery
  • June 25 - July 2:
    Redesigned the Architecture of the whole Project considering Celery and RabbitMQ. Implemented a prototype with Generic and Geolocation based Queues using Celery.
  • July 3 - July 10:
    Finding system configuration and designing Algorithm for finding Performance Value.
  • July 11 - July 18:
    Integrating Performance value with Queues for distributing URLs according to Performance of Thug Instances.
  • July 19 - July 29:
    Finalize code for Midterm Evaluation and get reviewed by Mentor. If time permits integrate Thug Honeyclient for processing URLs.
  • July 29: Midterm Evaluation
  • July 30 - August 12:
    Integrate Thug Honeyclient with Queues if unable to do the same before Midterm Evaluations. Test Performance by connecting different systems with Main Server for Geolocation based Queues.
  • August 13 - August 27:
    Testing the whole Network and implement any Optimization if possible.
  • August 28 - September 15:
    Writing Tests and documentation for the final Project.
  • September 16: Firm Pencils Down
  • Project Source Code Repository:
    Thug Distributed Task Queuing

    Student Weekly Blog:
    Thug Distributed Weekly Blog

    Project Useful Links:
    Thug Honeyclient.