GSoC Project #3 - Qebek: QEMU Based Sebek

Advanced new data capture technique for virtualized environments, looking to extend our existing work on Sebek for high interaction honeypot I/O capture to the hypervizor layer, for increased stealth, tamper-resistence and data correlation.
Primary Mentor: Brian Hay
Student: Chengyu Song
Deliverables: GPL licensed source code and a working demostration system running in my lab at Peking University. 
Goals:
During the GSoC 2009 I intend to implement a VMI-based honeynet monitoring tool for data capture on high interaction honeypots. This tool will have all the capability Sebek does now:

  • convertly monitors high interaction honeypot,
  • captures network activities,
  • captures keystrokes from network,
  • captures system activities.

But it will be

  • much more difficult for adversaries to detect they are being monitored;
  • much more difficult for adversaries to compromise or bypass the monitoring;
  • able to provides better correlation between network activites, keystrokes and system activities.

 
Milestones:
May 23rd: Get familiar with modifying translation codes of Argos.
June 20th: Finish the basic system call interception framework, including unit test.
Basic Features:

  • precall interception and precall data collection; Done
  • postcall interception and postcall data collection; Done

    t;/li>

  • automatically locate syscall address; Done
  • accurately pass collected information between precall and postcall. Done

Advanced Features:

  • supports reenter;
  • supports context switch.

June 27th: Finish the SVR support routine, including unit test.
Process Information:

  • ProcessID
  • ParentProcessID
  • ProcessName
  • UserName
  • WindowTitle

July 6th: Finish ConsoleSpy component.
July 13th: Finish mid-term evaluation.
July 27th: Finish porting current Sebek's hooked functions, i.e. port handler functions.
August 3rd: Finish adding correlation support of keystroke data and socket.
August 10th: Finish integration test and optimization.
August 25th: Finish final evaluation.
 
Detailed Plans
First Week
(May 11th - May 22th)

  1. Learn and document the QEMU code translation mechanism;
  2. Identify all the instructions need to be handled, according to the Intel Instruction Set Reference.

Second to Seventh Week
(May 22nd - June 21st)
Finish hook framework basic features and use console spy function and unit test:

  1. hook framework basic features;
  2. NtReadFile and NtWriteFile hook;
  3. std handle identification and buffer content extraction;

Eighth Week
(June 22nd - June 28th)
Finish SVR support routine and use console spy function and unit test: i.e. port GetProcessInfo function.
Ninth Week
(June 29th - July 5th)
Finish hook framework basic features and use console spy function and unit test:

  1. NtSecureConnectPort, NtRequestWaitReplyPort and NtClose hook;
  2. csrss port list table;
  3. basic log mechanism.

 
Test and Measurement

NtDeviceIoControlFile

As the console spy is almost finished, the next stage is mainly for network activities. Sebek Win32 version uses TDI hook to get this done. However, since getting driver object in virtualization layer is hard and TDI is TDI is on the path to deprecation, I need to find another way. The best solution seems to be hooking NtDeviceIoControlFile, the API Windows uses to do network related stuff and has been widely mentioned in malware behavior analysis papers. After some days of searching, I encounter a very useful resources today, a master thesis from TTAnalyze team:
 

stack crash?

This phenomenon is first observed when I tried the NtReadFile test last week, sometimes when the postNtReadFile is called, the handle value, buffer address and buffer size got from the stack is quite different from values got in preNtReadFile. I didn't pay much attention to this problem that time, but, when I tried to debug the NtSecureConnectPort API with WinDBG today, this phenomenon appeared again. So I did a further study on it.
 
First, I set a break point at nt!NtSecureConnectPort:
 

QEMU dyngen

This is supposed to be the first Qebek blog, but unfortunately, it cannot pass the check of mod_security (even today), so I posted here.  

Precall and Postcall

When using hooking technology to intercept system calls, there are two different places to collect information: before the original function is called (precall) and after the original function returns (postcall). For example, in Sebek Win32 client, when callback function OnZwReadFile is called, it first calls the original function s_fnZwReadFile, after the original function returns, it checks whether the original call succeeds,  if does, it then calls the data collection function LogIfStdHandle:

Is Handle Std

Sebek Windows client has two keystroke sources, one is read or write std stream, the other is csrss port. In the callback function of NtReadFile and NtWriteFile, Sebek will check if the given file handle match one of the three standard stream handles. if matches, it then logs the given data of keystrokes:

Get system call address from SSDT

One difference in Qebek from other existing virtualization based honeypot monitoring tool is that I want to 'hook' the function of system service instead of the dispatcher, more precisely, the 'sysenter' or 'int 2e' instruction. This is similar to the difference between SSDT (System Service Descriptor Table) hook and kernel inline hook. However, doing it this way must face a problem: how to get the function address? One way is get it directly from SSDT.

Syndicate content