X86 Assembly/High-Level Languages

Very few projects are written entirely in assembly. It's often used for accessing processor-specific features, optimizing critical sections of code, and very low-level work, but for many applications, it can be simpler and easier to implement the basic control flow and data manipulation routines in a higher level language, like C. For this reason, it is often necessary to interface between assembly language and other languages.

Compilers
The first compilers were simply text translators that converted a high-level language into assembly language. The assembly language code was then fed into an assembler, to create the final machine code output. The GCC compiler still performs this sequence (code is compiled into assembly, and fed to the AS assembler). However, many modern compilers will skip the assembly language and create the machine code directly.

Assembly language code has the benefit that it is in one-to-one correspondence with the underlying machine code. Each machine instruction is mapped directly to a single Assembly instruction. Because of this, even when a compiler directly creates the machine code, it is still possible to interface that code with an assembly language program. The important part is knowing exactly how the language implements its data structures, control structures, and functions. The method in which function calls are implemented by a high-level language compiler is called a calling convention.

The calling convention is a contract between the function and caller of the function and specifies several parameters:


 * 1) How the arguments are passed to the function, and in what order? Are they pushed onto the stack, or are they passed in via the registers?
 * 2) How are return values passed back to the caller? This is usually via registers or on the stack.
 * 3) What processor states are volatile (available for modification)? Volatile registers are available for modification by the function. The caller is responsible for saving the state of those registers if needed. Non-volatile registers are guaranteed to be preserved by the function. The called function is responsible for saving the state of those registers and restoring those registers on exit.
 * 4) The function prologue and epilogue, which sets up the registers and stack for use within the function and then restores the stack and registers before exiting.

CDECL
For C compilers, the CDECL calling convention is the de facto standard. It varies by compiler, but the programmer can specify that a function be implemented using CDECL usually by pre-appending the function declaration with a keyword, for example __cdecl in Visual studio:

in gcc it would be __attribute__( (__cdecl__ )):

CDECL calling convention specifies a number of different requirements:


 * 1) Function arguments are passed on the stack, in right-to-left order.
 * 2) Function result is stored in EAX/AX/AL
 * 3) Floating point return values will be returned in ST0
 * 4) The function name is pre-appended with an underscore.
 * 5) The arguments are popped from the stack by the caller itself.
 * 6) 8-bit and 16-bit integer arguments are promoted to 32-bit arguments.
 * 7) The volatile registers are: EAX, ECX, EDX, ST0 - ST7, ES and GS
 * 8) The non-volatile registers are: EBX, EBP, ESP, EDI, ESI, CS and DS
 * 9) The function will exit with a RET instruction.
 * 10) The function is supposed to return values types of class or structure via a reference in EAX/AX. The space is supposed to be allocated by the function, which unable to use the stack or heap is left with fixed address in static non-constant storage. This is inherently not thread safe. Many compilers will break the calling convention:
 * 11) GCC has the calling code allocate space and passes a pointer to this space via a hidden parameter on the stack. The called function writes the return value to this address.
 * 12) Visual C++ will:
 * 13) Pass POD return values 32 bits or smaller in the EAX register.
 * 14) Pass POD return values 33-64 bits in size via the EAX:EDX registers
 * 15) For non-POD return values or values larger than 64-bits, the calling code will allocate space and passes a pointer to this space via a hidden parameter on the stack. The called function writes the return value to this address.

CDECL functions are capable of accepting variable argument lists. Below is example using cdecl calling convention:

In order to assemble, link and run the program we need to do the following:

STDCALL
STDCALL is the calling convention that is used when interfacing with the Win32 API on Microsoft Windows systems. STDCALL was created by Microsoft, and therefore isn't always supported by non-Microsoft compilers. It varies by compiler but, the programmer can specify that a function be implemented using STDCALL usually by pre-appending the function declaration with a keyword, for example __stdcall in Visual studio:

in gcc it would be __attribute__( (__stdcall__ )):

STDCALL has the following requirements:


 * 1) Function arguments are passed on the stack in right-to-left order.
 * 2) Function result is stored in EAX/AX/AL
 * 3) Floating point return values will be returned in ST0
 * 4) 64-bits integers and 32/16 bit pointers will be returned via the EAX:EDX registers.
 * 5) 8-bit and 16-bit integer arguments are promoted to 32-bit arguments.
 * 6) Function name is prefixed with an underscore
 * 7) Function name is suffixed with an "@" sign, followed by the number of bytes of arguments being passed to it.
 * 8) The arguments are popped from the stack by the callee (the called function).
 * 9) The volatile registers are: EAX, ECX, EDX, and ST0 - ST7
 * 10) The non-volatile registers are: EBX, EBP, ESP, EDI, ESI, CS, DS, ES, FS and GS
 * 11) The function will exit with a RET n instruction, the called function will pop n additional bytes off the stack when it returns.
 * 12) POD return values 32 bits or smaller will be returned in the EAX register.
 * 13) POD return values 33-64 bits in size will be returned via the EAX:EDX registers.
 * 14) Non-POD return values or values larger than 64-bits, the calling code will allocate space and passes a pointer to this space via a hidden parameter on the stack. The called function writes the return value to this address.

STDCALL functions are not capable of accepting variable argument lists.

For example, the following function declaration in C:

_stdcall void MyFunction(int, int, short);

would be accessed in assembly using the following function label:

_MyFunction@12

Remember, on a 32 bit machine, passing a 16 bit argument on the stack (C "short") takes up a full 32 bits of space.

FASTCALL
FASTCALL functions can frequently be specified with the __fastcall keyword in many compilers. FASTCALL functions pass the first two arguments to the function in registers, so that the time-consuming stack operations can be avoided. FASTCALL has the following requirements:


 * 1) The first 32-bit (or smaller) argument is passed in ECX/CX/CL (see )
 * 2) The second 32-bit (or smaller) argument is passed in EDX/DX/DL
 * 3) The remaining function arguments (if any) are passed on the stack in right-to-left order
 * 4) The function result is returned in EAX/AX/AL
 * 5) The function name is prefixed with an "@" symbol
 * 6) The function name is suffixed with an "@" symbol, followed by the size of passed arguments, in bytes.

C++ Calling Conventions (THISCALL)
The C++ THISCALL calling convention is the standard calling convention for C++. In THISCALL, the function is called almost identically to the CDECL convention, but the this pointer (the pointer to the current class) must be passed.

The way that the this pointer is passed is compiler-dependent. Microsoft Visual C++ passes it in ECX. GCC passes it as if it were the first parameter of the function. (i.e. between the return address and the first formal parameter.)

Pascal Calling Conventions
The Pascal convention is essentially identical to cdecl, differing only in that:


 * 1) The parameters are pushed left to right (logical western-world reading order)
 * 2) The routine being called must clean the stack before returning

Additionally, each parameter on the 32-bit stack must use all four bytes of the DWORD, regardless of the actual size of the datum.

This is the main calling method used by Windows API routines, as it is slightly more efficient with regard to memory usage, stack access and calling speed.

Note: the Pascal convention is NOT the same as the Borland Pascal convention, which is a form of fastcall, using registers (eax, edx, ecx) to pass the first three parameters, and also known as Register Convention.

C/C++
This Borland C++ example splits  into two bytes in , the first containing high 4 bits and low 4 bits in the second.

Pascal
The FreePascal Compiler (FPC) and GNU Pascal Compiler (GPC) allow -blocks. While GPC only accepts AT&T-syntax, FPC can work with both, and allows a direct pass-through to the assembler. The following two examples are written to work with FPC (regarding compiler directives).

In FreePascal you can also write whole functions in assembly language. Also note, that if you use labels, you have to declare them beforehand (FPC requirement):