The Discoverer module (see zhongjie’s blog entry) has been completed.
It consists of 2 programs, the Format Discovery and Pre-Replay processing.
Format Discovery is pretty much what i’ve blogged about in my earlier post.
Since that entry, I’ve completed the to-do tasks:
have a function to summarise all output for this program.
solve a memory leak problem in this program.
match replay packet to format, and if length segment changes (eg: due to shellcode change), then length field needs to change.
The first part to the format discovery is 90% completed.
The program is now able to tokenize the sample packets and sort them to clusters according to token pattern.
The structure for a token looks like this:
// definition of a node for initial tokenization
struct sToken {
struct inferProperty* sProperty;
struct inferSemantic* sSemantic;
struct formatDistinguisher* sFD;
struct sToken* next;
};
struct inferProperty {
char szType[4]; //“s-c/c-s” / “bin” / “txt”