User:Doruletz72/ONBA/Assembly Language

INTRODUCTION
In the followings we will consider a C simple example to have an idea how the memory is managed while an application is running. It is a good idea to read the section Running an application first, if don't read it yet. Also, if you are eager to test your skills you can start OnbA. Follow the instructions from STARTING OnbA.

Global segment
First will refer the global variables. As I said before, all variables outside of the procedures forms the globals. They are stored inside of the DATA record in the PRC file. When application is started, it creates a portion of memory called GLOBAL SEGMENT. The base address of this segment is saved into register A5. So if you want to use this register for other purpose save its value (push it on the stack), then restore it after use. As you can see in the picture the variables are stored in backward order as they are declared in code. To use a global variable you must specify a value of offset relative to A5, where is stored the base address. These values are always negative numbers. Address modes used to do this are indirect address using register with index or with displacement.

Unfortunately, the disassembler programs usually not disassembles the DATA segment. Under OnbA to declare global variables you must "instruct" Onba to switch in "data mode" its output, i.e. OnbA will write inside the DATA record of the PRC file. To access the global variables OnbA allow you to use two modes: The two modes are equals, because in fact OnbA will replace "numeric labels" with proper values into binary code. But when you disassemble the binary code there are no more "numeric labels", so you must understand this mode of global variables organization in memory.

Stack segment
The STACK SEGMENT it is a little bit complicated than GLOBAL SEGMENT. This segment is created based on size stored in CODE 0 record of the PRC file. It holds the local variables - the variables inside of the procedures - and the intermediate values such parameters to be passed to a procedure or a system function at the moment of the call, values of the register pushed to the stack when situation demands. SP (A7) register keeps the current top pointer of the program stack. When a procedure/system function is called, the parameters are push it on the program stack (stack grows), along with PC register value (instructions pointer) and the current A6 register value (frame stack pointer). After that, the execution of the program will be continued with effective call of the procedure: creating a local stack frame (A6 hold the base address of this stack) and running of the instructions inside of the procedures.

We can see in the picture the moment of PilotMain initialization, when is created a local stack frame with the local variables. To access local variables of the PilotMain we will proceed in the same way how we handled the GLOBAL SEGMENT. But now will use A6 register as reference: Let see the next, the call of the OS system function StrCopy: The stack is increased, the parameters - string and "abcdefgh" addresses -, PC register value (the current instruction pointer), status registers SR and the values of the A6 register (containing the PilotMain frame stack value) are pushed on the stack. System function StrCopy is a black box for us, but mainly the things are going in the same way like to the procedures call (see bellow).

Part of this push process is not visible, but is done it by the TRAP instruction: it pushes PC, SR and A6, then it restores PC, SR and A6. And of course additionally stack management, too. SP contains now a new top stack value. After the call of StrCopy, the variable string contains value "abcdefgh". The value of PC, SR and A6 are restored and the stack is decreased. The SP contains now the old top stack value before calling of StrCopy. A6 register contains now the PilotMain frame stack reference so we can access the PilotMain local variable using A6 (see first picture).

The next C code lines initialize the values of the operand1 and operand2. The values of Sum parameters, operand1 and operand2 is pushed on the stack. The procedure Sum is called. At this moment a new frame stack is created for Sum procedure using A6 as reference register. The local variable res can be access through A6. As you can see by now the register SP is used only for stack management. If you want use SP register for other purpose, create a variable or use another register to save it, because SP cannot be pushed by itself on stack and then used. If you do this you will lose any reference to the top of the stack. So save it before.

The assembly code of the Sum procedures: Inside the Sum procedure, the parameters op1, op2 which are pushed on the stack, are referred using positive values, starting with offset 8. This "inversion" is normally because stack segment is backward addressed, from higher addresses to lower addresses. So when when "look in front", in fact you increase the negative absolute value, and when you try to "look back" you decrease the negative value. Because A6 is the reference, "inside" procedure means negative values, and before procedure means positive values. The offset of 8, between start of the procedure frame stack and its parameters, represents 4 bytes value of PC pushed by JSR instruction and 4 bytes value of old A6 pushed by LINK instruction when the procedure frame stack is created. The UNLK instruction restore the old A6 register value, and the RTS restore the PC old value. We will insist no more here. Just I hope you now understand a little bit more the STACK SEGMENT internals.

Code segment
By now you saw some references to PC register, which hold the pointer of the current instruction. Only one thing to discuss: the reference to "abcdefgh" constant. The local constant are the only written inside the CODE SEGMENT (in fact record). You can declare constant anywhere inside your program using  dc.b "value"  directive. OnbA will translate all string (array of chars) constants passed to functions/procedures after the end of current procedure body. You can do the same: "dc.w 0" means that OnbA automatically adds '\0' terminator string and an addtional '\0', if situation demands, to keep word alignment of code.

The entire equivalent OnbA source using all directives (easy way):

The entire equivalent OnbA source without using of systrap and call directives (harder way):

VARIABLES
Now we iterate the knowledge enumerated in above examples:
 * The variables types supported by OnbA are the byte, word (2 bytes) and long (4 bytes), both signed and unsigned. The only difference between a signed and unsigned types is how is interpreted the most signifiant bit (MSBi) as a sign or as a value. If is a sign the range of value is symmetrical disposed around value 0, from - max_value / 2 to + max_value / 2 - 1. If bit is a value, range is between 0 and max_value - 1. It is up to you, but check the supported type when you invoke an instruction or the correct type when call an OS function.
 * The values stored to registers or variables, can be pure values or pointers. Again it is up to you to decide this, and to use the properly instructions to manipulate them. Take care when you move a value to an address register, it will be automatically signed.
 * use registers as often as you can instead of global or local variables. The instructions are designed to work with them.
 * global variables are referred by A5 register; to declare put OnbA on data mode; use directive dc.SIZE, SIZE = .b, .w, .l or .; to access them you must be in code mode.
 * local variables are referred by register A6 (you can use a different address register if you want, but OnbA directives use only A6 register); to declare use directive local name.SIZE; the "local" directive is valid only between directives proc and the next beginproc.
 * Note: - at the assembly time, OnbA will replace the labels of the global or local variables with the proper numeric value.

Arrays
Here start the ugly part of the OnbA. It don't have proper support for arrays. There are no simple "ways" or directives to access array's elements. But let's take the good part, in this hard way you will understand the assembly language better. Simple in C, isn't it? In fact it doesn't matter how simple it looks in C or C++ or C#, finally it is only a matter how simple it looks in assembly code. Because the processor don't run the C code, but the assembly code. Let's parse the example. First of all to access an array's element we need to access the address of it, address of the first element, and calculate then the offset of the desired element, using element's size and index. First instruction, LEA, loads effective address of the arrays in register address A0. You can choose other address register if you wish. Then, we must compute the offset relative to this address. In this case, size of the elements is size of long (4 bytes), so the offset is 15*4. Both instructions used to compute this value are the fastest possible. MOVEQ is faster then classically MOVE, and LSL << 2 (logical shift left) is faster than MUL * 4. These aspects are very important, because accessing and using array's elements are very common inside of program source. The last instruction is a special type of address mode. Value 1000 is copied to the location pointed by base address stored in A0 shifted with the value of index stored in D0, and with value of the offset, 0 in this case. Perhaps is time to read more about address modes. In this example we use a pointer to an array instead of a static array, i.e. the allocated memory is in the DYNAMICALLY SEGMENT, so we keep a low size for the program stack. For simplicity there is no safe code to check if the allocation was succeeded. Now our local array variable is a pointer, an address of memory. So instead of LEA, we need only the MOVEA (move address) to load the base array address to the base register A0. In case of addresses we can use MOVEA instead of simple MOVE (don't ever use MOVEA with data registers). To calculate the offset, "add, " is faster than "lsl 1, ". Last time we used LSL << 2 instead of slower MUL * 4. Now we could do better. Results are the same in all cases, but you will save hundred of microseconds at the execution time. So your code is faster. The code is a simple disassembly of the C source compiled under Onboard C. Now we can understand why is good to "know" assembly language. The example can be improved, because the algorithm created by Onboard C is not the best, is only the safest.

Inside the for loop, first instruction "movea array(a6), a0" is looped on each iteration. And is not need for this. The value in register A0 is not altered, and there is no system functions calls inside the loop (system calls alters the contain of the registers), so it no need to refresh data in A0. We can move it outer. Why use only 2 neurons/registers of the processor, A0 and D0? Onboard C not allows register variables, I know, but here we can do this. So instead of local variable "i", we will use another available data register, D1 by example. Don't forget registers are faster. So faster algorithm because less instructions and fasters. Now another example: And an optimized version of assembly code: You can't do this under Onboard C, but here you will do. For arrays of chars or strings I will show another type of address mode: This source is built by me, because Onboard C uses same type of address mode (d8,An,Xi). Here is the (An)+ address mode, indirect address register with post-increment. First will use the address in A0 for our interest, then the address is incremented with size of the operand, in this case size of byte, 1. In conclusion, use faster instruction when you calculate the offset of an element:
 * moveq and lsl for arrays of long
 * add ,  instead of lsl for arrays of word
 * try (An)+ address mode for arrays of byte

Structures
Here start the very, very ugly part of the OnbA. It don't have proper support for structures. No directives to access structure's elements. Even the "equ" directive is wrong implemented, and is working only in few cases. So you can't use some suggestive way to represent members of a structure. The only way is to use a lot of comments, or to avoid to use structures. Let's consider the following example: Assembly code built by OnbA: Ugly, very ugly. An idea is to insert comment to have a clear view of your structure. Anyway, when use Onboard C you instruct first the compiler how the structure it looks. So you can proceed in the same manner, but to instruct yourself how the structure looks. A optimized source code for previous example: You can structure your comment as you consider is better, to easy see the offsets of the members. Normally you could use the "equ" directive to replace numerical offset with suggestive labels, but is not works here. Other assemblers support this, like Pila. But Pila is a desktop software.

Now is the time to dig a lot inside the address modes to easy handle variables, cause we will continue with other assembly language elements.

STRING CONSTANTS
We refer here the implied string passed as arguments to the procedures or system functions. Where is stored the value "This a string constant"? The string constants are the only data stored inside CODE SEGMENT. So if we refer a string constant we must use PC register as reference. Don't bother for '\0' null terminator or for padding. OnbA automatically adds a null terminator, and if the situation demands, adds a '\0' to preserve the word alignment boundary. Because string constants are stored inside CODE SEGMENT, which is read-only, don't attempt to write at these locations. You can't and your application probably will crash at this point. There is a minus: writing string constants inside will increase the size of the code segment. And as we know a code segment is limited to 32kb in size. Perhaps is better to use resources to store the strings or pdb files. It is up to you. If your code is too large you can use additional code segments.

OPERATORS
Assembly language is not use operators in the way of the high level languages they do. It treats the operators as operations and it uses instructions to cover this.

Assignment
As you can see in the above examples, the most used operator, = assignment, is covered mainly by memory movement instructions like MOVE, MOVEA, MOVEQ, LEA etc.

Integer arithmetic
Also there are available instructions for integer arithmetic operators:
 * +, addition - ADD, ADDA (add address), ADDI (immediate), ADDQ (quick), ADDX (extend with sign)
 * -, subtraction - SUB, SUBA, SUBI, SUBQ, SUBX
 * *, multiplication - MULS (signed), MULU (unsigned)
 * /, division - DIVS, DIVU
 * %, modulo/remainder is not available through an instruction. Here is an example for remainder operation:

BCD
The 68k processor supports the integer numbers as BCD (binary-coded decimal). The BCD is a encoding of decimal integers where each digit of the number is represented by its binary codification. Read the wiki article http://en.wikipedia.org/wiki/Binary-coded_decimal for more information about BCD. There are available the following instructions: ABCD, SBCD, NBCD.

Bits operations
Also there is available a set of bits instructions: BTST, BSET, BCLR, TAS.

Floating Point
68k processor isn't supports the floating point numbers. You can use long and juggle with the decimal point. What is difference between 0,00000078 and 78/100000000? Or you can use the Palm OS the two sets of emulation libraries (FPLxxx and FLPxxx) to deal with the floating point numbers.

Logical
For logical operations are available: AND, ANDI, OR, ORI, EOR, EORI, NOT

Shift and rotation

 * shift: arithmetic ASL/ASR, logical LSL/LSR
 * rotations: ROL/ROR, ROXL/ROXR, SWAP

More information you can find in Instructions Set chapter.

STATEMENTS
Both conditional and loop statements are consist of a test portion, followed by a jump instruction to a label. Jump labels are visible in the whole code segment, so is not a bad idea to tag name prefixed with the procedures initials to make the code more readable and to avoid confusion and errors.

Conditionals
Let's consider an example: Three alternatives for the C source: After the execution of an instruction, the register CCR flags are set according to the result value from the perspective of value 0. Considering this, if a comparison with 0 is sufficient to analyze the CCR register value immediately after the execution of instruction. For a comparison with a nonzero value, an additional CMP instruction must be executed. In fact the CMP is a SUB instruction without effective execution, but the flags (C, X, N, Z, V) are set in accordance.

The Bcc instructions set (Branch if Condition Code) can be used to create different conditional scenarios. Practically these instructions will cause a branch in the program if certain flags are set. A '''switch ... case''' example: And the asm source: It's not so difficult as it seems to be.

Loops
For loop statements you need a starting label to identify the start of the loop. Then you must put a branch instruction to jump to the start of the loop or out of it. Between start label and the branch instruction you must alter the reference value. We can use one of the Bcc instructions to do this: Alternatively we could use DBcc instructions set (Decrement and Branch if a condition code): But is not supported by OnbA:

FUNCTIONS
As we have seen, most examples have been structured as procedures. The goal was gaining habit of using them. There are two types of functions: system routines and the user procedures. In fact, the concept of function is very different from the C language. Here, there is no return values. In case of the system routines (Palm OS APIs):
 * register D0 will be used as a placeholder for return values
 * register A0 will be used as a placeholder for return addresses (pointers)

So if you intend to use these values you must do this before to alter in some ways the content of the specified registers. In case of user procedures there is no obligation to use D0 or A0 (it is up to you where/from write/read), but it is a good habit to use same standards.

System Routines
The system routines (Palm OS APIs) are blocks of binary code already placed at fixed memory locations by OS at the moment of device initialization. They are interfaces with memory, files area, interrupts, display registers and memory and so on. We can invoke them using TRAP #15 instruction followed by a word value containing the code of the routine. First of all we must do a PUSH operation of their arguments: increase the size of stack and put arguments on the stack into this space. The parameters must be pushed backwardly. After the call of the routine, we must decrease the size of the stack. And of course if we intend to use a "return" value, we must read D0 or A0, immediately after these. Let's consider a mixed example, a system routine, inside of a user procedure: The asm "real" source: The asm OnbA source (easy way): Using of systrap directive:
 * systrap TrapName / #TrapNumber (parameter.size, ...)

where:
 * TrapName equ $
 * size - specifies the size of the operand, for correct stack management; if you forgot to do this OnbA will use .w, so take care!

In case of pointers, if you need to pass a "location", use ampersand & prefix and there is no need to specify a size:
 * systrap TrapName / #TrapNumber (&parameter, ...)
 * here size will be always long, or 4, because it is an address.
 * in the PUSH sequence "move.l ..." will be replaced with "pea ..."

User Procedures
The user procedures are blocks of code referred using PC register. To access them you can use BSR/JSR instructions. See example above in this chapter Code Segment. To define a procedure you must use the following OnbA directives:
 * proc ProcedureName(param.size, ...)
 * local var.size ; optional, valid only here
 * local ...
 * beginproc
 * endproc
 * endproc

When you call a procedure defined with OnbA directives you must use call directive:
 * call ProcedureName (parameter.size, ...)

where:
 * size - specifies the size of the operand, for correct stack management; if you forgot to do this OnbA will use .w, so take care!

In case of pointers, if you need to pass a "location", use ampersand & prefix and there is no need to specify a size:
 * call ProcedureName (&parameter, ...)
 * here size will be always long, or 4, because it is an address.
 * in the PUSH sequence "move.l ..." will be replaced with "pea ..."

To get the "returned" values, it is up to you where/from where write/read this values, but is better to use D0 for values and A0 for addresses. Is easy to read and everybody will talk same "language".