X86 Disassembly/Variables

Variables
We've already seen some mechanisms to create local storage on the stack. This chapter will talk about some other variables, including global variables, static variables, variables labelled "const," "register," and "volatile." It will also consider some general techniques concerning variables, including accessor and setter methods (to borrow from object-oriented terminology). This section may also talk about setting memory breakpoints in a debugger to track memory I/O on a variable.

How to Spot a Variable
Variables come in 2 distinct flavors: those that are created on the stack (local variables), and those that are accessed via a hardcoded memory address (global variables). Any memory that is accessed via a hard-coded address is usually a global variable. Variables that are accessed as an offset from esp, or ebp are frequently local variables.


 * Hardcoded address : Anything hardcoded is a value that is stored as-is in the binary, and is not changed at runtime. For instance, the value 0x2054 is hardcoded, whereas the current value of variable X is not hard-coded and may change at runtime.

Example of a hardcoded address:

OR:

Example of a non-hardcoded (softcoded?) address:

In the last example, the value of ecx is calculated at run-time, whereas in the first 2 examples, the value is the same every time. RVAs are considered hard-coded addresses, even though the loader needs to "fix them up" to point to the correct locations.

.BSS and .DATA sections
Both .bss and .data sections contain values which can change at run-time (e.g. variables). Typically, variables that are initialized to a non-zero value in the source are allocated in the .data section (e.g. "int a = 10;"). Variables that are not initialized, or initialized with a zero value, can be allocated to the .bss section (e.g. "int arr[100];"). Because all values of .bss variables are guaranteed to be zero at the start of the program, there is no need for the linker to allocate space in the binary file. Therefore, .bss sections do not take space in the binary file, regardless of their size.

"Static" Local Variables
Local variables labeled static maintain their value across function calls, and therefore cannot be created on the stack like other local variables are. How are static variables created? Let's take a simple example C function:

Compiling to a listing file with cl.exe gives us the following code:

Normally when assembly listings are posted in this wikibook, most of the code gibberish is discarded to aid readability, but in this instance, the "gibberish" contains the answer we are looking for. As can be clearly seen, this function creates a standard stack frame, and it doesn't create any local variables on the stack. In the interests of being complete, we will take baby-steps here, and work to the conclusion logically.

In the code for Line 7, there is a call to _printf with 3 arguments. Printf is a standard libc function, and it therefore can be assumed to be cdecl calling convention. Arguments are pushed, therefore, from right to left. Three arguments are pushed onto the stack before _printf is called:



The second one, _a$[ebp] is partially defined in this assembly instruction:

_a$ = 8

And therefore _a$[ebp] is the variable located at offset +8 from ebp, or the first argument to the function. OFFSET FLAT:$SG797 likewise is declared in the assembly listing as such:

If you have your ASCII table handy, you will notice that 0aH = 0x0A = '\n'. OFFSET FLAT:$SG797 then is the format string to our printf statement. Our last option then is the mysterious-looking "?x@?1??MyFunction@@9@9", which is defined in the following assembly code section:

This shows that the Microsoft C compiler creates static variables in the .bss section. This might not be the same for all compilers, but the lesson is the same: local static variables are created and used in a very similar, if not the exact same, manner as global values. In fact, as far as the reverser is concerned, the two are usually interchangeable. Remember, the only real difference between static variables and global variables is the idea of "scope", which is only used by the compiler.

Signed and Unsigned Variables
Integer formatted variables, such as int, char, short and long may be declared signed or unsigned variables in the C source code. There are two differences in how these variables are treated:
 * 1) Signed variables use signed instructions such as add, and sub. Unsigned variables use unsigned arithmetic instructions such as addi, and subi.
 * 2) Signed variables use signed branch instructions such as jge and jl. Unsigned variables use unsigned branch instructions such as jae, and jb.

The difference between signed and unsigned instructions is the conditions under which the various flags for greater-than or less-than (overflow flags) are set. The integer result values are exactly the same for both signed and unsigned data.

Floating-Point Values
Floating point values tend to be 32-bit data values (for float) or 64-bit data values (for double). These values are distinguished from ordinary integer-valued variables because they are used with floating-point instructions. Floating point instructions typically start with the letter f. For instance, fadd, fcmp, and similar instructions are used with floating point values. Of particular note are the fload instruction and variants. These instructions take an integer-valued variable and converts it into a floating point variable.

We will discuss floating point variables in more detail in a later chapter.

Global Variables
Global variables do not have a limited scope like lexical variables do inside a function body. Since the notion of lexical scope implies the use of the system stack, and since global variables are not lexical in nature, they are typically not found on the stack. Global variables tend to exist in the program as a hard-coded memory address, a location which never changes throughout program execution. These could exist in the DATA segment of the executable, or anywhere else that a hard-coded memory address can be used to store data.

In C, global variables are defined outside the body of any function. There is no "global" keyword. Any variable which is not defined inside a function is global. In C however, a variable which is not defined inside a function is only global to the particular source code file in which it is defined. For example, we have two files  and , and a global variable  :

In the example above, the variable  is visible inside the file , but is not visible inside the file. To make  visible inside all project files, we need to use the   keyword, which we will discuss below.

" " Variables
The C programming language specifies a special keyword " " to define variables which are lexical to the function (they cannot be referenced from outside the function) but they maintain their values across function calls. Unlike ordinary lexical variables which are created on the stack when the function is entered and are destroyed from the stack when the function returns, static variables are created once and are never destroyed.

Static variables in C are global variables, except the compiler takes precautions to prevent the variable from being accessed outside of the parent function's scope. Like global variables, static variables are referenced using a hardcoded memory address, not a location on the stack like ordinary variables. However unlike globals, static variables are only used inside a single function. There is no difference between a global variable which is only used in a single function, and a static variable inside that same function. However, it's good programming practice to limit the number of global variables, so when disassembling, you should prefer interpreting these variables as static instead of global.

" " Variables
The  keyword is used by a C compiler to indicate that a particular variable is global to the entire project, not just to a single source code file. Besides this distinction, and the slightly larger lexical scope of extern variables, they should be treated like ordinary global variables.

In static libraries, variables marked as being extern might be available for use with programs which are linked to the library.

Global Variables Summary
Here is a table to summarize some points about global variables:

When disassembling, a hard-coded memory address should be considered to be an ordinary global variable unless you can determine from the scope of the variable that it is static or extern.

Constants
Variables qualified with the const keyword (in C) are frequently stored in the .data section of the executable. Constant values can be distinguished because they are initialized at the beginning of the program, and are never modified by the program itself. For this reasons, some compilers may choose to store constant variables (especially strings) in the .text section of the executable, thus allowing the sharing of these variables across multiple instances of the same process. This creates a big problem for the reverser, who now has to decide whether the code he's looking at is part of a constant variable or part of a subroutine.

"Volatile" memory
In C and C++, variables can be declared "volatile," which tells the compiler that the memory location can be accessed from external or concurrent processes, and that the compiler should not perform any optimizations on the variable. For instance, if multiple threads were all accessing and modifying a single global value, it would be bad for the compiler to store that variable in a register sometimes, and flush it to memory infrequently. In general, Volatile memory must be flushed to memory after every calculation, to ensure that the most current version of the data is in memory when other processes come to look for it.

It is not always possible to determine from a disassembly listing whether a given variable is a volatile variable. However, if the variable is accessed frequently from memory, and its value is constantly updated in memory (especially if there are free registers available), that's a good hint that the variable might be volatile.

Simple Accessor Methods
An Accessor Method is a tool derived from OO theory and practice. In its most simple form, an accessor method is a function that receives no parameters (or perhaps simply an offset), and returns the value of a variable. Accessor and Setter methods are ways to restrict access to certain variables. The only standard way to get the value of the variable is to use the Accessor.

Accessors can prevent some simple problems, such as out-of-bounds array indexing, and using uninitialized data. Frequently, Accessors contain little or no error-checking.

Here is an example:

Because they are so simple, accessor methods are frequently heavily optimized (they generally don't need a stack frame), and are even occasionally inlined by the compiler.

Simple Setter (Manipulator) Methods
Setter methods are the antithesis of an accessor method, and provide a unified way of altering the value of a given variable. Setter methods will often take as a parameter the value to be set to the variable, although some methods (Initializers) simply set the variable to a pre-defined value. Setter methods often do bounds checking, and error checking on the variable before it is set, and frequently either a) return no value, or b) return a simple boolean value to determine success.

Here is an example: