Sunday, December 25, 2011

Malware Analysis Tutorial 8: PE Header and Export Table

Learning Goals:

  1. Understand the portable executable (PE) header of binary executables.
  2. Understand the EXPORT TABLE.
  3. Practice disassemble and reverse engineering techniques.
Applicable to:
  1. Operating Systems.
  2. Computer Security.

1. Introduction

In this tutorial, we will analyze the first harmful operation performed by Max++. It changes the structure of the export table of ntdll.dll. Recall the analysis of Max++ in Tutorial 7, the malware reads the information in TIB and PEB, and examines the loaded modules one by one, until it encounters "ntdll.dll" (this is accomplished using a checksum function inside a two layer nested loop).

In this tutorial, we will reverse engineer the code starting at 0x40105C.

2. Background Information of PE Header
Any binary executable file (no matter on Unix or Windows) has to include a header to describe its structure: e.g., the base address of its code section, data section, and the list of functions that can be exported from the executable, etc. When the file is executed by the operating system, the OS simply reads this header information first, and then loads the binary data from the file to populate the contents of the code/data segments of the address space for the corresponding process. When the file is dynamically linked (i.e., the system calls it relies on are not statically linked in the executable), the OS has to rely on its import table to determine where to find the entry addresses of these system functions.

Most binary executable files on Windows follows the following structure: DOS Header (64 bytes), PE Header, sections (code and data). For a complete survey and introduction of the executable file format, we recommend Goppit's "Portable Executable File Format - A Reverse Engineering View" [1].

DOS Header starts with magic number 4D 5A 50 00, and the last 4 bytes is the location of PE header in the binary executable file. Other fields are not that interesting. The PE header contains significantly more information and more interesting. In Figure 1, please find the structure of PE Header. We only list the information that are interesting to us in this tutorial. For a complete walk-through, please refer to Goppit's work [1].

Figure 1. Structure of PE Header
 At run time of a binary executable, Windows loader actually loads the PE header into a process's address space. There are some well defined data structures defined in winnt.h for each of the major part of the PE header.

As shown in Figure 1, PE header consists of three parts: (1) a 4-byte magic code, (2) a 20-byte file header and its data type is IMAGE_FILE_HEADER, and (3) a 224-byte optional header (type: IMAGE_OPTIONAL_HEADER32). The optional header itself has two parts: the first 96 bytes contain information such as major operating systems, entry point, etc. The second part is a data directory of 128 bytes. It consists of 16 entries, and each entry has 8 bytes (address, size).

We are interested in the first two entries: one has the pointer to the beginning of the export table, and the other points to the import table.


2.1 Debugging Tool Support (Small Lab Experiments)
Modern binary debuggers have provided sufficient support for examining PE headers. We discuss the use of WinDbg and Immunity Debugger.

(1) WinDbg. Assume that we know that the PE structure of ntdll.dll is located at memory address 0x7C9000E0. We can display the second part: file header using the following.

dt nt!_IMAGE_FILE_HEADER 0x7c9000e4
   +0x000 Machine          : 0x14c
   +0x002 NumberOfSections : 4
   +0x004 TimeDateStamp    : 0x4802a12c
   +0x008 PointerToSymbolTable : 0
   +0x00c NumberOfSymbols  : 0
   +0x010 SizeOfOptionalHeader : 0xe0
   +0x012 Characteristics  : 0x210e


Then we can calculate the starting address of the optional header: 0x7C9000E4 + 0x14 (20 bytes) = 0x7C9000F8. The attributes of optional header is displayed as below. For example, the major linker version is 7 and the the address of entry point is 0x12c28 (relative of the base address 0x7c900000).


kd> dt _IMAGE_OPTIONAL_HEADER 0x7c9000F8
nt!_IMAGE_OPTIONAL_HEADER
   +0x000 Magic            : 0x10b
   +0x002 MajorLinkerVersion : 0x7 ''
   +0x003 MinorLinkerVersion : 0xa ''
   +0x004 SizeOfCode       : 0x7a000
   +0x008 SizeOfInitializedData : 0x33a00
   +0x00c SizeOfUninitializedData : 0
   +0x010 AddressOfEntryPoint : 0x12c28
   +0x014 BaseOfCode       : 0x1000
   +0x018 BaseOfData       : 0x76000
   +0x01c ImageBase        : 0x7c900000
   +0x020 SectionAlignment : 0x1000
   +0x024 FileAlignment    : 0x200
   +0x028 MajorOperatingSystemVersion : 5
   +0x02a MinorOperatingSystemVersion : 1
   +0x02c MajorImageVersion : 5
   +0x02e MinorImageVersion : 1
   +0x030 MajorSubsystemVersion : 4
   +0x032 MinorSubsystemVersion : 0xa
   +0x034 Win32VersionValue : 0
   +0x038 SizeOfImage      : 0xaf000
   +0x03c SizeOfHeaders    : 0x400
   +0x040 CheckSum         : 0xb62bc
   +0x044 Subsystem        : 3
   +0x046 DllCharacteristics : 0
   +0x048 SizeOfStackReserve : 0x40000
   +0x04c SizeOfStackCommit : 0x1000
   +0x050 SizeOfHeapReserve : 0x100000
   +0x054 SizeOfHeapCommit : 0x1000
   +0x058 LoaderFlags      : 0
   +0x05c NumberOfRvaAndSizes : 0x10
   +0x060 DataDirectory    : [16] _IMAGE_DATA_DIRECTORY

As shown by Goppit [1], OllyDbg can display PE structure nicely. Since the Immunity Debugger is based on OllyDbg, we can achieve the same effect. In IMM View -> Memory, we can easily locate the starting address of each module (e.g., see Figure 2).


Figure 2. Getting PE Header Address
Then in the memory dump window, jump to the starting address of the PE of ntdll.dll. Then right click in the dump pane, and select special -> PE, we can have all information nicely presented by IMM.




3. Export Table

Recall that the first entry of IMAGE_DATA_DIRECTORY of the optional header field contains information about the export table. By Figure 1, you can soon infer that the 4 bytes located at PE + 0x78 (i.e., offset 120 bytes) is the relative address (relative to DLL base address) of the export table, and the next byte (at offset 0x7C) is the size of the export table.

The data type for the export table is  IMAGE_EXPORT_DIRECTORY. Unfortunately, the WinDbg symbol set does not include the definition of this data structure, but you can easily find it in winnt.h through a google search (e.g., from [2]). The following is the definition of IMAGE_EXPORT_DIRECTORY from [2].

typedef struct _IMAGE_EXPORT_DIRECTORY {
  DWORD Characteristics; //offset 0x0
  DWORD TimeDateStamp; //offset 0x4
  WORD MajorVersion;  //offset 0x8
  WORD MinorVersion; //offset 0xa
  DWORD Name; //offset 0xc
  DWORD Base; //offset 0x10
  DWORD NumberOfFunctions;  //offset 0x14
  DWORD NumberOfNames;  //offset 0x18
  DWORD AddressOfFunctions; //offset 0x1c
  DWORD AddressOfNames; //offset 0x20
  DWORD AddressOfNameOrdinals; //offset 0x24
 }

Here, we need some manual calculation of addresses for each attribute for our later analysis. In the above definition, WORD is a computer word of 16 bites (2bytes), and DWORD is 4 bytes. We can easily infer that, MajorVersion is located at offset 0x8, and AddressOfFunctions is located at offset 0x1c.

Now assume that IMAGE_EXPORT_DIRECTORY is located at 0x7C903400, the following is the dump from WinDbg (here "dd" is to display memory):

kd> dd 7c903400
7c903400  00000000 48025c72 00000000 00006786
7c903410  00000001 00000523 00000523 00003428
7c903420  000048b4 00005d40 00057efb 00057e63
7c903430  00057dc5 00002ad0 00002b30 00002b40
7c903440  00002b20 0001eb58 0001ebb9 0001e3af
7c903450  0002062d 000206ee 0004fe3a 00012d71
7c903460  000211e7 0001eaff 0004fe2f 0004fdaa
7c903470  0001b08a 0004febb 0004fe6d 0004fde6

We can soon infer that there are 0x523 functions exposed in the export table, and there are 0x523 names exposed. Why? Because the NumberOfFunctions is located at offset 0x14 (thus its address is 0x7c903400+0x14 = 0x7c903414)  For another example, look at the attribute "Name" which is located at offset 0xc (i.e., its address: 0x7c90340c), we have number 0x00006787. This is the address relative to the base DLL address (assume it is 0x7c900000). Then we have the name of the module located at 0x7c906786. We can verify using the "db" command in WinDbg (display memory contents as bytes): you can verify that the module name is indeed ntdll.dll.


kd> db 7c906786
7c906786  6e 74 64 6c 6c 2e 64 6c-6c 00 43 73 72 41 6c 6c  ntdll.dll.CsrAll
7c906796  6f 63 61 74 65 43 61 70-74 75 72 65 42 75 66 66  ocateCaptureBuff


Read page 26 of [1], you will find that the "AddressOfFunctions", "AddressOfNames", and "AddressOfNameOdinals" are the most important attributes. There are three arrays (shown as below), and each of the above attributes contains one corresponding starting address of an array.

PVOID Functions[523]; //each element is a function pointer
char * Names[523]; //each element is a char * pointer
short int Ordinal[523]; //each element is an 16 bit integer

For example, by manual calculation we know that the Names array starts at 7C9048B4 (given the 0x48B4 located at offset 0x20, for attribute AddressOfNames; and assuming the base address is 0x7C900000). We know that each element of the Names array  is 4 bytes. here is the dump of the first 8 elements:
kd> dd 7c9048b4
7c9048b4  00006790 000067a9 000067c3 000067db
7c9048c4  00006807 0000681f 00006831 00006845


We can verify the first name (00006790): It's CsrAllocateCaptureBuffer. Note that a "0" byte is used to terminate a string.
kd> db 7c906790
7c906790  43 73 72 41 6c 6c 6f 63-61 74 65 43 61 70 74 75  CsrAllocateCaptu
7c9067a0  72 65 42 75 66 66 65 72-00 43 73 72 41 6c 6c 6f  reBuffer.CsrAllo


We can also verify the second name (000067a9): It's CsrAllocateMessagePointer.
kd> db 7c9067a9
7c9067a9  43 73 72 41 6c 6c 6f 63-61 74 65 4d 65 73 73 61  CsrAllocateMessa
7c9067b9  67 65 50 6f 69 6e 74 65-72 00 43 73 72 43 61 70  gePointer.CsrCap


Now, given a Function name, how do we find its entry address? The following is the formula:
Note that array index starts from 0.
Assume Names[x].equals(FunctionName)
Function address is Functions[Ordinal[x]]

4. Challenge1 of the Day
The first sixteen elements of the Ordinal is shown below:
kd> dd 7c905d40
7c905d40  00080007 000a0009 000c000b 000e000d
7c905d50  0010000f 00120011 00140013 00160015


The first eight elements of the Functions array is shown below:
kd> dd 7c903428
7c903428  00057efb 00057e63 00057dc5 00002ad0
7c903438  00002b30 00002b40 00002b20 0001eb58



What is the entry address of function CsrAllocateCaptureBuffer? The answer is: it's 7C91EB58. Think about why? (Pay special attention to the byte order of integers).


5. Analysis of Code
We now start to analyze the code, starting at 0x40105C. Set a hardware breakpoint at 0x40105C (in code pane, right click -> Go To Expression (0x40105c) and then right click -> breakpoints -> hardware, on execution). Press F9 to run to the point. The first instruction should be PUSH DS:[EAX+8]. If you see a bunch of BYTE DATA instructions, that's caused by the byte scission of the code. Highlight all these BYTE DATA instructions, right click -> Treat as Command during next analysis and we should have the correct disassembly displayed in IMM.

Figure 4: Accessing Export Table


Now let us analyze the first couple of instructions starting at 0x40105C (in Figure 4). Continuing the analysis of Tutorial 7, we know that  after reading the module information (one by one), the code jumps out of the loop when it encounters the ntdll.dll. At this moment, EAX contains the address of offset 0x18 of LDR_DATA_TABLE_ENTRY. In another word, EAX points to the attribute "DllBase". Thus, the instruction at 0x40105C, i.e., PUSH DWORD DS:[EAX+8] is to push the DllBase into the stack. Executing this command, you will find that 0x7C900000 appearing on top of the stack.

The control flow soon jumps to 0x401077 and 0x401078. It soon follows that at 0x401070, ECX now has value 0x7C90000 (the DLL base).  Now consider the instruction 0x40107D:
       MOV EAX, DWORD PTR DS:[ECX+3C]

Recall that the beginning of a PE file is the DOS header (which is 64 bytes) and the last 4 bytes of the DOS header contains the location of the PE header (offset relative to the DLL base) [see Section 2]. Hex value 0x3C is decimal value 60! So we now have EAX containing of PE offset. Observing the registry pane, we have EAX = 0xE0. We then infer that PE header is located at 0x7C9000E0 (which is 0x7C900000 base + offset 0xE0).

Now observe instruction at 0x401087:
    MOV ESI, DWORD PTR DS:[EAX+78]

Note the offset 0x78, its decimal value is 120. From Figure 1, we can soon infer that offset 0x78 is the address of the EXPORT table data entry in the IMAGE_DATA_DIRECTORY of the optional header. Thus, ESI now contains the entry address of the export table (offset relative to DLL base) ! After instruction at 0x40108c, Its value is now 0x7C903400 (starting address of EXPORT TABLE).


5. Challenge of the Day
We have demonstrated to you some basic analysis techniques to reverse engineer the malicious logic. Now your job is to continue our analysis and explain what the Max++ malware is trying to do. Specifically, you can follow the road-map below:
(1) What does function 0x004138A8 do? What are its input parameters?
(2) Which data fields of the export table are the instructions between 0x4010AD and 0x4010C4 accessing?
(3) Explain the meaning of EDX*+C in the instruction at 0x4010BB.
(4) Explain the logic of 0x4010CB to 0x4010F6.
(5) What is the purpose of the loop from 0x401103 to 0x401115?
(6) What does function 0x40165E do? What are its input parameters?
(7) Explain the code between 0x401117 and 0x40113E.


References
1. Goppit, "Portable Executable Format - a Reverse Enginnering View", v1(2), Code Breakers Magzine, January 2006.
2. An online copy of winnt.h, Available at http://source.winehq.org/source/include/winnt.h

6 comments:

  1. Hi,

    I really have read couple of articles over pe executable file format to understand it more in depth. But yup i am agree, that you made few points more clear. Thanks for that.

    ReplyDelete
  2. Dr,

    You made a VERY COMPREHENSIVE tutorials on malware analysis. but hardly to see any of your publication in malware problems.
    Why is that?

    I saw your PhD thesis also was in different topic. Just curious.

    Your tutorials are very helpful. Thanks!!!

    ReplyDelete
  3. challange 1 : (future refrence)

    1. Name array, find the name and its index, for ex- (1)st function is CsrAllocateCaptureBuffer

    2. Ordinal array(2 byte only), get the index ordinal , ex -(1)st ordinal is 0008

    3. function address array, find the address using ordinal value , ex - (0008) i.e 8 th position which is = 0001eb58

    finally add offset to base address.

    ReplyDelete
  4. what is meant by image base address, virtual address and relative virtual address ?

    what i know is, in the demand paging cpu generates a virtual address which contains page number address+ offset d from that we check page table and then map to the appropriate frame in main memory and calculate physical address by (frame no-1)* page size + d

    but when i have read about PE file format its very different
    what i found is virtual address= image base + relative virtual address offset


    can anyone please explain this why its different ?

    ReplyDelete
  5. Audio typing services look like the traditional dictating services, digital audio services take away the problem of motor sound, thus, improving the quality. typing services offered

    ReplyDelete