X86 Disassembly/Microsoft Windows

Microsoft Windows
The Windows operating system is a popular reverse engineering target for one simple reason: the OS itself (market share, known weaknesses), and most applications for it, are not Open Source or free. Most software on a Windows machine doesn't come bundled with its source code, and most pieces have inadequate, or non-existent documentation. Occasionally, the only way to know precisely what a piece of software does (or for that matter, to determine whether a given piece of software is malicious or legitimate) is to reverse it, and examine the results.

Windows Versions
Windows operating systems can be easily divided into 2 categories: Windows9x, and WindowsNT.

Windows 9x
The Windows9x kernel was originally written to span the 16bit - 32bit divide. Operating Systems based on the 9x kernel are Windows 95, Windows 96, Windows 98, and Windows Me. Windows9x Series operating systems are known to be prone to bugs and system instability. The actual OS itself was a 32 bit extension of MS-DOS, its predecessor. An important issue with the 9x line is that they were all based around using the ANSI format for storing strings, rather than Unicode.

Development on the Windows9x kernel ended with the release of Windows Me.

Windows NT
The WindowsNT kernel series was originally written as enterprise-level server and network software. WindowsNT stresses stability and security far more than Windows9x kernels did (although it can be debated whether that stress was good enough). It also handles all string operations internally in Unicode, giving more flexibility when using different languages. Operating systems based on the WindowsNT kernel are: Windows NT (versions 3.1, 3.11, 3.2, 3.5, 3.51 and 4.0), Windows 2000 (NT 5.0), Windows XP (NT 5.1), Windows Server 2003 (NT 5.2), Windows Vista (NT 6.0), Windows 7 (NT 6.1), Windows 7.1 (NT 6.11), Windows 8 (NT 6.2), Windows 8.1 (NT 6.3), and Windows 10 (NT 10.0).

The Microsoft Xbox and Xbox 360 also run a variant of NT, forked from Windows 2000. Most future Microsoft operating system products are based on NT in some shape or form.

Virtual Memory
Memory is organized into "pages" that are 4096 bytes by default. Pages not in current use by the system or any of the applications may be written to a special section on the hard disk known as the "paging file." Use of the paging file may increase performance on some systems, although high latency for I/O to the HDD can actually reduce performance in some instances.

32-bit Windows NT allows for a maximum of 4 GiB of virtual memory address space per process. This is divided into 2 GiB user memory and 2 GiB kernel memory by default.

In some 32-bit versions and editions, the operating system can be started with the /3GB switch which divides this into 3 GiB user memory and 1 GiB kernel memory. Only 32-bit applications that are compiled with the large memory flag can use up to 3 GiB in this mode. The /3GB switch is not supported in 64-bit Windows, but 32-bit applications with the large memory flag can access up to 4 GiB on 64-bit Windows. 64-bit applications are not restricted in this way.

Starting with the Pentium Pro CPU some 32-bit versions and editions can use Physical Address Extensions (the /PAE switch) to access memory above 4 GiB up to 64 GiB. This memory can be accessed by 32-bit applications that support PAE (i.e. some versions of 32-bit Microsoft SQL Server and 32-bit Microsoft Exchange Server). Special configuration is required, however.

System Architecture
The Windows architecture is heavily layered. Function calls that a programmer makes may be redirected 3 times or more before any action is actually performed. There is an unignorable penalty for calling Win32 functions from a user-mode application. However, the upside is equally unignorable: code written in higher levels of the windows system is much easier to write. Complex operations that involve initializing multiple data structures and calling multiple sub-functions can be performed by calling only a single higher-level function.

The Win32 API comprises 3 modules: KERNEL32, USER32, and GDI32. KERNEL32 is layered on top of NTDLL, and most calls to KERNEL32 functions are simply redirected into NTDLL function calls. USER32 and GDI32 are both based on WIN32K (a kernel-mode module, responsible for the Windows "look and feel"), although USER32 also makes many calls to the more-primitive functions in GDI32. This and NTDLL both provide an interface to the Windows NT kernel, NTOSKRNL (see further below).

NTOSKRNL is also partially layered on HAL (Hardware Abstraction Layer), but this interaction will not be considered much in this book. The purpose of this layering is to allow processor variant issues (such as location of resources) to be made separate from the actual kernel itself. A slightly different system configuration thus requires just a different HAL module, rather than a completely different kernel module.

System calls and interrupts
After filtering through different layers of subroutines, most API calls require interaction with part of the operating system. Services are provided via 'software interrupts', traditionally through the "int 0x2e" instruction. This switches control of execution to the NT executive / kernel, where the request is handled. It should be pointed out here that the stack used in kernel mode is different from the user mode stack. This provides an added layer of protection between kernel and user. Once the function completes, control is returned back to the user application.

Both Intel and AMD provide an extra set of instructions to allow faster system calls, the "SYSENTER" instruction from Intel and the SYSCALL instruction from AMD.

Win32 API
Both WinNT and Win9x systems utilize the Win32 API. However, the WinNT version of the API has more functionality and security constructs, as well as Unicode support. Most of the Win32 API can be broken down into 3 separate components, each performing a separate task.

kernel32.dll
Kernel32.dll, home of the KERNEL subsystem, is where non-graphical functions are implemented. Some of the APIs located in KERNEL are: The Heap API, the Virtual Memory API, File I/O API, the Thread API, the System Object Manager, and other similar system services. Most of the functionality of kernel32.dll is implemented in ntdll.dll, but in undocumented functions. Microsoft prefers to publish documentation for kernel32 and guarantee that these APIs will remain unchanged, and then put most of the work in other libraries, which are then not documented.

gdi32.dll
gdi32.dll is the library that implements the GDI subsystem, where primitive graphical operations are performed. GDI diverts most of its calls into WIN32K, but it does contain a manager for GDI objects, such as pens, brushes and device contexts. The GDI object manager and the KERNEL object manager are completely separate.

user32.dll
The USER subsystem is located in the user32.dll library file. This subsystem controls the creation and manipulation of USER objects, which are common screen items such as windows, menus, cursors, etc... USER will set up the objects to be drawn, but will perform the actual drawing by calling on GDI (which in turn will make many calls to WIN32K) or sometimes even calling WIN32K directly. USER utilizes the GDI Object Manager.

Native API
The native API, hereby referred to as the NTDLL subsystem, is a series of undocumented API function calls that handle most of the work performed by KERNEL32. Microsoft also does not guarantee that the native API will remain the same between different versions, as Windows developers modify the software. This gives the risk of native API calls being removed or changed without warning, breaking software that utilizes it.

ntdll.dll
The NTDLL subsystem is located in ntdll.dll. This library contains many API function calls, that all follow a particular naming scheme. Each function has a prefix: Ldr, Nt, Zw, Csr, Dbg, etc... and all the functions that have a particular prefix all follow particular rules.

The "official" native API is usually limited only to functions whose prefix is Nt or Zw. These calls are in fact the same in user-mode: the relevant Export entries map to the same address in memory. However, in kernel-mode, the Zw* system call stubs set the previous mode to kernel-mode, ensuring that certain parameter validation routines are not performed. The origin of the prefix "Zw" is unknown; this prefix was chosen due to its having no significance at all.

In actual implementation, the system call stubs merely load two registers with values required to describe a native API call, and then execute a software interrupt (or the  instruction).

Most of the other prefixes are obscure, but the known ones are:
 * Rtl stands for "Run Time Library", calls which help functionality at runtime (such as RtlAllocateHeap)
 * Csr is for "Client Server Runtime", which represents the interface to the win32 subsystem located in csrss.exe
 * Dbg functions are present to enable debugging routines and operations
 * Ldr provides the ability to load, manipulate and retrieve data from DLLs and other module resources

User Mode Versus Kernel Mode
Many functions, especially Run-time Library routines, are shared between ntdll.dll and ntoskrnl.exe. Most Native API functions, as well as other kernel-mode only functions exported from the kernel are useful for driver writers. As such, Microsoft provides documentation on many of the native API functions with the Microsoft Server 2003 Platform DDK. The DDK (Driver Development Kit) is available as a free download.

ntoskrnl.exe
This module is the Windows NT "'Executive'", providing all the functionality required by the native API, as well as the kernel itself, which is responsible for maintaining the machine state. By default, all interrupts and kernel calls are channeled through ntoskrnl in some way, making it the single most important program in Windows itself. Many of its functions are exported (all of which with various prefixes, a la NTDLL) for use by device drivers.

Win32K.sys
This module is the "Win32 Kernel" that sits on top of the lower-level, more primitive NTOSKRNL. WIN32K is responsible for the "look and feel" of windows, and many portions of this code have remained largely unchanged since the Win9x versions. This module provides many of the specific instructions that cause USER and GDI to act the way they do. It's responsible for translating the API calls from the USER and GDI libraries into the pictures you see on the monitor.

Win64 API
With the advent of 64-bit processors, 64-bit software is a necessity. As a result, the Win64 API was created to utilize the new hardware. It is important to note that the format of many of the function calls are identical in Win32 and Win64, except for the size of pointers, and other data types that are specific to 64-bit address space.

Windows Vista
Microsoft has released a new version of its Windows operation system, named "Windows Vista." Windows Vista may be better known by its development code-name "Longhorn." Microsoft claims that Vista has been written largely from the ground up, and therefore it can be assumed that there are fundamental differences between the Vista API and system architecture, and the APIs and architectures of previous Windows versions. Windows Vista was released January 30th, 2007.

Windows CE/Mobile, and other versions
Windows CE is the Microsoft offering on small devices. It largely uses the same Win32 API as the desktop systems, although it has a slightly different architecture. Some examples in this book may consider WindowsCE.

"Non-Executable Memory"
Recent windows service packs have attempted to implement a system known as "Non-executable memory" where certain pages can be marked as being "non-executable". The purpose of this system is to prevent some of the most common security holes by not allowing control to pass to code inserted into a memory buffer by an attacker. For instance, a shellcode loaded into an overflowed text buffer cannot be executed, stopping the attack in its tracks. The effectiveness of this mechanism is yet to be seen, however.

COM and Related Technologies
COM, and a whole slew of technologies that are either related to COM or are actually COM with a fancy name, is another factor to consider when reversing Windows binaries. COM, DCOM, COM+, ActiveX, OLE, MTS, and Windows DNA are all names for the same subject, or subjects, so similar that they may all be considered under the same heading. In short, COM is a method to export Object-Oriented Classes in a uniform, cross-platform and cross-language manner. In essence, COM is .NET, version 0 beta. Using COM, components written in many languages can export, import, instantiate, modify, and destroy objects defined in another file, most often a DLL. Although COM provides cross-platform (to some extent) and cross-language facilities, each COM object is compiled to a native binary, rather than an intermediate format such as Java or .NET. As a result, COM does not require a virtual machine to execute such objects.

Due to the way that COM works, a lot of the methods and data structures exported by a COM component are difficult to perceive by simply inspecting the executable file. Matters are made worse if the creating programmer has used a library such as ATL to simplify their programming experience. Unfortunately for a reverse engineer, this reduces the contents of an executable into a "Sea of bits", with pointers and data structures everywhere.

Remote Procedure Calls (RPC)
RPC is a generic term referring to techniques that allow a program running on one machine to make calls that actually execute on another machine. Typically, this is done by marshalling all of the data needed for the procedure including any state information stored on the first machine, and building it into a single data structure, which is then transmitted over some communications method to a second machine. This second machine then performs the requested action, and returns a data packet containing any results and potentially changed state information to the originating machine.

In Windows NT, RPC is typically handled by having two libraries that are similarly named, one which generates RPC requests and accepts RPC returns, as requested by a user-mode program, and one which responds to RPC requests and returns results via RPC. A classic example is the print spooler, which consists of two pieces: the RPC stub spoolss.dll, and the spooler proper and RPC service provider, spoolsv.exe. In most machines, which are stand-alone, it would seem that the use of two modules communicating by means of RPC is overkill; why not simply roll them into a single routine? In networked printing, though, this makes sense, as the RPC service provider can be resident physically on a distant machine, with the remote printer, and the local machine can control the printer on the remote machine in exactly the same way that it controls printers on the local machine.