Wednesday, January 4, 2012

Malware Analysis Tutorial 10: Tricks for Confusing Static Analysis Tools

Learning Goals:
  1. Explore Use of Stack for Supporting Function Calls
  2. Practice Reverse Engineering
Applicable to:
  1. Operating Systems.
  2. Computer Security.
  3. Programming Language Principles.

1. Introduction
This tutorial explores several tricks employed by Max++ for confusing static analysis tools. These tricks effectively prevent static program analysis tools that plot call graph and extract system call invocation information of the malware. By "static" we mean that the tool does not actually execute/run the malware. Most "smart" virus scanners are static analysis tools. Many of them employ heuristics to tell if a binary executable is malicious or not by examining the collection of the system function calls in that binary. For example, if a binary invokes too many operations related to registry, then an alert should be flagged. If such analysis can be blocked, the malware can significantly improve its survival rate. Note that, however, such tricks cannot block "dynamic" tools which actually run the malware (typical examples include CWSandBox and Anubis).

2. Lab Configuration
You can either continue from Tutorial 9, or follow the instructions below to set up the lab. Refer to Tutorial 1 and Tutorial 4 for setting up VBOX instances and WinDbg. We will analyze the function starting at 0x4014F9. 

Figure 1. Function 0x4014F9 to Analyze

(1) In code pane, right click and go to expression "0x40105c"
(2) right click and then "breakpoints -> hardware, on execution"
(3) Press F9 to run to 0x40105c
(4) If you see a lot of DB instructions, select them and right click -> "During next analysis treat them as Command".
(5) Exit from IMM and restart it again and run to 0x40105c. Select the instructions (about 1 screen) below 0x40105c, right click -> Analysis-> Analyze Code. You should be able to see all loops now identified by IMM.
(6) Now go to 0x401147, you will notice that it's "CALL 0x4014F9". Press F4 to run to the point and then Press F7 to step into the function 0x4014F9.

3. Two-Layer Function Return
We now analyze the first trick of a two-layer function return which disrupts call graph generation. In the following we analyze function 0x00401838. Observe the instructions from 0x401502 to 0x401505 (in Figure 2). Our first impression would be that Function 0x401038 takes three parameters: a pointer to string "ntdll.dll", 0x7C903400, and 0x7C905D40. However, later you will find that it is not the case: function 0x00401838 is simply used to confuse static analysis tools.

Figure 2. Two Layer Function Call at 0x401505
Figure 3 displays the function body of 0x401838. It starts with a call of 0x00413650 and then a bunch of other instructions (later, you will notice that these instructions will never be executed).

Figure 3. Function body of 0x401838
Notice that at 0x0040183B, it calls function 0x00413650, whose function body is displayed in Figure 4. There are only two instructions: POP EAX, and RETN.
Figure 4. Function Body of 0x413650
Now we have the interesting part. Look at the stack contents in Figure 5. First of all, starting from the third computer word, we have the three words pushed by the code earlier (they are pointer to "ntdll.dll", 0x7c903400, and 0x7c905d40). Then you might notice that the top two words are the RETURN ADDRESSES pushed by the CALL instructions.

Each CALL instruction consists of essentially two steps: push the address of the next instruction to the stack (so that it can return when the function being called is completed) and then jump to the entry address of the function. Thus, it is not difficult to infer that 0x00401840 is pushed by the CALL 0x00413650 at 0x0040183B (see Figure 3), and 0x0040150A is pushed by the CALL 0x00401838 at 0x00401505 (see Figure 2). So the POP EAX will pop off 0x00401840 and save it to EAX. When the RETN instruction is executed, it is directly returning to 0x40150A (i.e., it jumps two layers back)! Clearly, the instructions starting from 0x00401840 are never executed and the two layer jumping can confuse quite a number of static analysis tools when they try to plot call graphs.

Figure 5. Stack Contents at 0x00413650

4. Invoking NTDLL System Calls using Encoded Table
Next we show an interesting technique to invoke ntdll.dll functions without the use of export table. We will analyze the instruction at 0x401557 (as shown in Figure 6), it calls function 0x4136BF. Later, you will find that function 0x4136BF invokes zwAllocateVirtualMemory without exposing the entry address of zwAllocateVirtualMemory explicitly and it does not use export table.

We leave the analysis of  the logic between 0x40150A and 0x401557 (shown in Figure 6) to readers. Basically the code is to establish an encoded translation table in stack from 0x0012D538 to 0x0012D638.

Figure 6. Code Between 0x40150A and 0x401557

Now observe the function body of 0x004136BF in Figure 7. The first instruction is CALL 0x004136DC. You might notice that between 0x004136BF and 0x004136DC, there are some gibberish code. If you read it more carefully, you will find that they are actually the contents of string "zwAllocateVirtualMemory" where the byte at 0x004136C4 is the character "z".

Think about the CALL 0x004136BF again. It is essentially two stesp:
    PUSH 0x004136C4   # note 0x004136C4 is the beginning of "zwAllocateVirtualMemory" (see Figure 8)
    JUMP 0X004136DC

Figure 7. Function Body of 0x4136BF.

Now you can pretty much guess the point of the code: it is trying to invoke zwAllocateVirtualMemory! But how is it accomplished? Let's delve deeper. At 0x004136DC it is calling function 0x00401172 and when it returns, at 0x004136E1, it JMP EAX. Can you guess the functionality of 0x00401172?

Figure 8. Stack Contents at 0x4136DC before the Call of 0x00401172
We list some hints below (see Figure 9):
(1) The loop between 0x0040118E and 0x004011A0 is to compute the checksum of the function name
(2) The loop between 0x4011A2 and 0x004011BA is a binary search. The search is performed on the encoded export table (as discussed in Tutorial 9). Each entry has two elements: (1) check sum of the function name, and (2) the entry address of the function.

You could easily infer that function 0x00401172, given the name of a function, returns its entry address. Once it returns, the return value is saved in EAX. Then JMP EAX at 0x004136E1 will invoke the function.

5. Challenge of the Day
(1) What is the checksum of zwAllocateVirtualMemory?
(2) What are the parameters of zwAllocateVirtualMemory? Why does Max++ call this function?
(3) Look at 0x40117B, what is stored in the thread local storage (EAX+2C)? How do you make your conclusion?


  1. This page has a lot of typos. All though this is good tutorial.

  2. I am by the names of Joseph Lwomwa Student of Computer Science specialized in Computer security at Makerere University in a country called Uganda found in Africa whose capital city is Kampala. I am currently writing my research and my research topic is "A HYBRID ALGORITHM TO DETECT MALWARE AND ELIMINATE ZERO DAY ATTACKS" which hybrid algorithm is composed of static Heuristic algorithm and static signature based algorithms.

    The reason i am writing to you is that i realized that probably you could be in the best position to help me out on several things such as;
    i) whether this topic of mine is genuine and relevant at this current point in time

    ii)The structure of the heuristic algorithms that have been used in malware detection and in the various anti-malware detection tools

    iii) The structure of the signature detection algorithms that have been used in malware detection and in the various anti-malware detection tools

    iv) plus relevant material that could be a great resource towards coming up with this algorithm, and the reason am asking for your assistance is because i strongly believe you could be best suited in helping me.

    I am further writing to you because i have searched several papers but with nothing close to the structures of these algorithms since several of these researchers have connections with the malware companies that give or expose little about the algorithms that have been developed and used in the research.

    I will be very glad and grateful for your positive assistance and help. My email is