Project 13 - Network Analyzer

Student: Oğuz Yarimtepe
Primary mentor: Nicolas Collery
Backup mentor: Adam Pridgen

Google Melange: http://www.google-melange.com/gsoc/project/google/gsoc2012/oguzy/69002

Project Overview:

Web based packet analyzer that will aim an automated analyzer for the uploaded pcap files. The aim will be the open alternative for http://netwitness.com/products-services/investigator. The first fulfillments will include visualization of the analyzed traffic, application level information display and the plugin support for the malware and anomalies.

Project Plan:

  • April 23rd - May20th: Community Bonding Period
  • May 21st : GSoC 2012 coding officially starts
  • May 21st - May 28th: analyze FTP, HTTP, SMTP and DNS traffic and save them to mongodb
  • May 29th - June 5th: continue handling protocol issues. UDP streams and TCP flows should be analyzed, attached files, header and body information (for smtp), request and response headers, returned response html, js files, images (for HTTP), commands and files sent (for FTP) should be kept either at db and/or at directories.
  • June 5th - June 12th: Create the first visualization at the Django interface. It will be a timeline visualization of the uploaded pcap file which displays the applicaton level protocol
  • June 13th - June 20th: Create the detail visualizations that will be opened when the protocols on the timeline is clicked. Treemap and Parallel Coordinates will be used.
  • June 20th - June 27th: Continue detail visualization pages, which are also displaying the protocol detail on the page.
  • June 28th - July 4th: Create the scatter plot for an overall protocol distribution.
  • July 4th - July 8th: Add registered user entrance with search capabilities to the web interface
  • July 9th - July 13th: Mid Term Assessments
  • July 14th - July 20th: Malware analyzer part should be added. Gathered binaries, js files can be sent to external services for being analyzed
  • July 21st - July 27th: Continue malware section. YARA and and external js analyzer should be handled
  • July 27st - August 12th: Deal with the missing part of the project. Fix template design issues, documentation. If i have time, write CLI for the project.
  • August 13th: Suggested "pencils down" date, coding close to done
  • August 20th: Firm "pencils down" date, coding must be done
  • August 24th - August 27th: Final Assessments
  • August 31st - Public code uploaded and available to Google

Project Deliverables:

A standalone web based network analyzer will be developed. It will be working online, will be published for the rest of the world for test usaged. The development will continue, as long as the feedbacks are returned about the site.
The result page will be a web based pcap analyzer. The analyzer will extract application level information from pcap files, visualize them and display them at the web interface. After uploading the files, packet protocols will be detected, analyzed and the information is saved. Time Line visualization[1] will display the protocols in the pcap file. When the protocol is clicked, a new page will display more information related with the protocol. For an application level protocol like HTTP, this information will be attached files, whether there have malicious codes or not, if the return answer is an html file then the response headers and the body, etc. At this page, treemaps[2] and parallel coordinates[3] will be used to give detail information about packet bytes and port numbers.

A general scatter plot visualization[4] will be used the to display the distribution of the protocols uploaded till that time.

Yara will be used to analyze binary files. Some external js analyzers[5] will be used to analyze the files at the html files.

The site will have modular approach so that other analyzers will be be able to be plugged in the following days. Same is valid also for protocol detection. At the beginning, it is planned to use the Bro for protocol detection. For the unknown or undetected protocols, detectors will be written to Bro or one may write its own protocol detection.

  1. http://timeglider.com/jquery/, http://www.simile-widgets.org/timeline/
  2. http://mbostock.github.com/d3/ex/treemap.html/
  3. http://exposedata.com/parallel/veggie/, http://bl.ocks.org/1341281/
  4. http://mbostock.github.com/protovis/ex/dot.html/
  5. http://jsunpack.jeek.org/,http://wepawet.iseclab.org/

Project Source Code Repository:
https://github.com/oguzy/openwitness

Student Weekly Blog: https://www.honeynet.or/blog/296

Project Useful Links:
I only have git repo for now. I will be using it for issue tracking (https://github.com/oguzy/openwitness/issues) and wiki usage (https://github.com/oguzy/openwitness/wiki)

Project Updates:

  • 23.04.2012 - 30.05.2012
    • Buildout environment is set for Django project at the Github repo
    • Twitter Bootstrap is set up
    • A basic upload page is set up
    • MongoDB for Django is set up
  • 01.05.2012 - 7.05.2012
    • Timeline is set
    • Uploading the pcap is handled
    • Bro is set up and used for network layer protocol detection and applicaton level protocol detection after the upload finished
    • TCP Flows are created for application level protocol detection
    • TCP/IP information (source ip, source port, destination ip, destination port, protocol type) is saved at the db with the related pcap and its belonging flow file information
    • A blog entry giving information about what has been done is written
  • 08.05.2012 - 14.05.2012
    • Parser for HTTP is written: saves the request and response information to database
    • HTTP Parser is also saving the response body with the including js files
    • Some bugs are seen during test period with pcap files including multiple flows, and they are fixed
    • For keeping files, a directory for each upload is created on disk. The name of each directory is hashed from the timestamp
    • Is is seen that Bro does not detect the HTTP protocol for some pcaps downloaded from pcapr.net. The possible reason is that they have missing tree-way-handshake. It is planned to implement an alternative tshark, for testing these disturbed http traffic
  • 15.05.2012 - 21.05.2012
    • For application level protocol handling, bro usage is changed. It is seen that Bro-ids also handles TCP reassembly issue. Bro settings are changed for creating reasembled contents on disk
    • HTTP handler module is changed as if it will be reading information from the contents file created. Two new classes are added to the model. IP numbers in a flow are read from the contents file and saved
    • HTTP Handler is changed a bit as if one may use its own http handler. A setting entry is used for this issue with the dynamic loading of the python module according to the directory name written at the settings file
  • 22.05.2012 - 28.05.2012
    • HTTP handler and Fİle Handler is almost finished so as to handle HTTP responses and requests.
    • Returned binary files at the responses are saved on disk using hachoir Python library
    • Saving returned gzipped data, css and js files to disk is implemented but not tested yet
    • With the current condition, an uploaded pcap is parsed into flow pcaps, its request and response header information is saved with the IP and port number information
  • 04.06.2012
    Done last week
    • DNS handler is written. Saving the request and response information to db also implemented
    • SMTP handler is written. Basic information for an email like recipient from, to whom the mail is sent, mail body, attachment info, is saved to the db. Attachments are handled by saving them to the disk. Their paths are kept at the db.
    • Some minor fixes handled, like url settings, parameter changes while saving to database, was using force_insert.
    Planned next week
    • Time line visualization for uploaded samples including HTTP, DNS and SMTP
    • Change at the module structure is required. UDP related stuff should be moved under the udp directory and for tcp, they should be under TCP
    • I should fix the general functions that are called for getting IP flows and sving request and responses while handling application level data. They are the ones one should implement while writing its own handler. At the current status, these functions take different number of parameters. I should make them have the same signature.
    Issues
    • There may be missing relations and time information at the flows i kept at the database, which will require to re-handling of the handler
    • Changing the file structure will cause some errors, but they will be easily solved
    • I haven't tested the handlers detaily, so testing them with mixed pcaps may cause errors
  • 11.06.2012
    Done last week
    • Timeline visualization is implemented:
      First view from the timelien visualization
    • A REST api is written to handle database operations. By using the API, JSON data is gathered from database
    • Jquery Timeglider is used for creating timeline visualization
    • Login page is defined. Passwords are kept with salt hashes. Username, userid and email is required to login the system. User creation is defined as a manual process for now. A Python script is written to automate this process from terminal.
    Planned next week
    • Treemap and Scatter plot visualization should be started.
    • Should add more info to the timeline visualization page
    Issues
    • For treemaps i need to save the byte counts or sizes of the packets as well, which is not implemented yet
    • Handling with REST api is taking sometime, finding the right way making things a bit longer
  • 18.06.2012
    Done last week
    • Treemap visualization is implemented per application level protocol. Packet sizes and counts are considered per flow while creating visualization:
      Treemap Visualization by considering packet counts per HTTP flow
      Treemap Visualization by considering packet sizes per HTTP flow
      Treemap Visualization by considering packet sizes per SMTP flow
      Treemap Visualization by considering packet counts per SMTP flow
    • Details are added to the treemap visualization
    • Timeline visualization is changed. Per flow information is visualized.
      Timeline Visualization per each applicaiton level protocol
    • Summary detail information is also added to the timeline visualization
      Timeline details
    Planned next week
    • Scatter plot visualization for application level protocol
    • Links per flow visualized at the timeline should be added. Clicking them should open details page.
    Issues
    • Creating meaningful visualization is taking time
    • Handling with javascript visualization is sometimes error prone, hard to detect problems.
  • 25.06.2012
    Done last week
    • Detail page links per protocol is implemented. For HTTP it is possible to display the request and response information as well as the returned html pages.
      Links per protocol
      HTTP information display for a clicked flow
      HTTP returned files
    • For SMTP it possible to download the attachments, body info and see the header info also
    • SMTP details

    • DNS details are simply implemented
      DNS details
    Planned next week
    • Scatter plot visualization for overall protocols should be implemented
    Issues
    • I still need to implement parallel coordinates, need to make better the details page also
  • 01.07.2012
    Done last week
    • Scatter plot visualization is implemented. After login to the systems, it is possible to see the overall protocols, uploaded by also others.
      Scatter plot visualization displaying the overall protocols uploaded
    • Detail information legends are also added for the scatter plot page, displaying date, count and protocol names
    Planned next week
    • Dealing with login problems
    • Add a search option or let the scatter plots clickable
    • I should also check the details page for missing part at the previous work
    Issues
  • 09.07.2012
    Done last week
    • Fixed missing part at the flow detail page. Files belong to the flow are saved at the db during the upload process
    • Scatter plot visualization is changed a bit. Used triangle and circular symbols for visualizations. Also added links to the protocol names. Clicking on them opens a page that displays the source IP, source port, destination IP and destination port with a timestamp
    • Login issue is changed as if login is kept until logout is clicked
    • Devel branch is merged with the master branch
    Planned next week
    • Check YARA and external js services for malware analysis
    • Midterm evaluation starts
    Issues
  • 16.07.2012
    Done last week
    • Handler for unknown traffic handling is written
    • Tried to deploy the openwitness to a public access, still working on it. Had some version and server problems.
    Planned next week
    • External JS services and Virustotal handler implementations for analyzing the js files and binaries
    Issues
  • 23.07.2012
    Done last week
    • After talking with backup mentor, some missing parts are fixed at the web interface for a beta release:
      * Pagination support is added
      * Flow pcap list are simplified and collected under a select box
      * At the treemap view, links are defined for flow details and pcaket details
      * The application is deployed with Apache2 + mod_wsgi: ow.comu.edu.tr
    Planned next week
    • External JS services and Virustotal handler implementations for analyzing the js files and binaries
    Issues
  • 30.07.2012
    Done last week
    • Handler for VirusTotal is written that uses its API
    • Email attachment file are scanned with the VirusTotal API and the result is displayed at the page
    • Displaying the file types are added
    Planned next week
    • Malwr.com like site display. The default page will be a malwr.com like main page. Clicking on the MD5 sum will open a more detailed page
    • The detailed page will give a more detailed information like the one here: http://malwr.com/analysis/e681bd119bca257c40243a0aa2ac284b/
    • Tabs at the detailed page will be a summary page that will display the timeline, an overview page that will display the scatter plot display and a visualization page that will display the parallel coordinates
    • About me page is required, that give information about the project, an contact information and FAQ can be added
    Issues
  • 06.08.2012
    Done last week
    • Network Analyzer is online with the new design. The default page is changed to a malwr.com like site. The order of the pcaps are in descending order.
    • Clicking on the hashes opens the pcap details page. The page includes the tabs. Visualizations are moved to those tabs.
    • Parallel coordinates are added to the visualizations.
    • Preliminary About page is added
    Planned next week
    • Scatter plot requires a more readable view
    • About page requires more information
    • Upload works, but the pages after logged in should change according to the user_id filter for queries.
    • Maybe a blog post
    Issues
  • 13.08.2012
    Done last week
    • VM image is created and put on a public FTP site for being tested.
    • Some minor bugs are fixed. VM image and ow.comu.edu.tr site is update
    • Project name is changed to ovizart. Paths and Github repo is changed.
    Planned next week
    • Detailed installation guide
    • Blog post
    • Documentation
    Issues