X86 Disassembly/Objects and Classes

Object-Oriented Programming
Object-Oriented (OO) programming provides for us a new unit of program structure to contend with: the Object. This chapter will look at disassembled classes from C++. This chapter will not deal directly with COM, but it will work to set a lot of the groundwork for future discussions in reversing COM components (Windows users only).

Classes
A basic class that has not inherited anything can be broken into two parts, the variables and the methods. The non-static variables are shoved into a simple data structure while the methods are compiled and called like every other function.

When you start adding in inheritance and polymorphism, things get a little more complicated. For the purposes of simplicity, the structure of an object will be described in terms of having no inheritance. At the end, however, inheritance and polymorphism will be covered.

Variables
All static variables defined in a class resides in the static region of memory for the entire duration of the application. Every other variable defined in the class is placed into a data structure known as an object. Typically when the constructor is called, the variables are placed into the object in sequential order, see Figure 1.

A:

B:

However, the compiler typically needs the variables to be separated into sizes that are multiples of a word (2 bytes) in order to locate them. Not all variables fit this requirement, namely char arrays; some unused bits might be used pad the variables so they meet this size requirement. This is illustrated in Figure 2.

A:

B:

In order for the application to access one of these object variables, an object pointer needs to be offset to find the desired variable. The offset of every variable is known by the compiler and written into the object code wherever it's needed. Figure 3 shows how to offset a pointer to retrieve variables.

Figure 3: This shows how to offset a pointer to retrieve variables. The first line places the address of variable 'a' into eax. The second line places the address of variable 'b' into ebx. And the last line places the variable 'c' into ecx.

Methods
At a low level, there is almost no difference between a function and a method. When decompiling, it can sometimes be hard to tell a difference between the two. They both reside in the text memory space, and both are called the same way. An example of how a method is called can be seen in Figure 4.

A:

B:

A notable characteristic in a method call is the address of the object being passed in as an argument. This, however, is not a always a good indicator. Figure 5 shows function with the first argument being an object passed in by reference. The result is function that looks identical to a method call.

A:

B:

Inheritance & Polymorphism
Inheritance and polymorphism completely changes the structure of a class, the object no longer contains just variables, they also contain pointers to the inherited methods. This is due to the fact that polymorphism requires the address of a method or inner object to be figured out at runtime.

Take Figure 6 into consideration. How does the application know to call D::one or C::one? The answer is that the compiler figures out a convention in which to order variables and method pointers inside the object such that when they're referenced, the offsets are the same for any object that has inherited its methods and variables.

Figure 6: A small C++ polymorphic loop that calls a function, one. The classes C and D both inherit an abstract class, A. The class A, for this code to work, must have a virtual method with the name, "one."

The abstract class A acts as a blueprint for the compiler, defining an expected structure for any class that inherits it. Every variable defined in class A and every virtual method defined in A will have the exact same offset for any of its children. Figure 7 declares a possible inheritance scheme as well as it structure in memory. Notice how the offset to C::one is the same as D::one, and the offset to C's copy of A::a is the same as D's copy. In this, our polymorphic loop can just iterate through the array of pointers and know exactly where to find each method.

A:

B: