A-level Computing/OCR/Unit 1.4.1 Data Types

Data Types
When we store data, we assign it a specific type. The type we use for a piece of data affects how we store and process it.

You need to have an understanding of the following types.

Representing Positive Integers in Binary
Numbers can be represented in many bases, it so happens that humans use denary (or base 10) when working with numbers.

Let's take the number 184. We can break it into units, 10s and 100s columns like so: What we place in the cell is the number of times we want the column added to get our number. Here, we want one 100, eight 10s and four 1s.

$$(1 \times 100) + (8 \times 10) + (4 \times 1) = 184$$

We can see that each column increases by a factor of 10 which is the base we are representing the number in.

Let's see how this works in binary (base 2) taking the number 14.


 * The highest power of 2 that fits into 14 is 8 (23), $$14 - 8 = 6$$.
 * The highest power of 2 that fits into 6 is 4 (22), $$6 - 4 = 2$$.
 * This leaves us with 2 which is one of our columns.

This gives us 14 in base 2.

$$(1 \times 8) + (1 \times 4) + (1 \times 2) = 14$$

In the example above we used 4 bits (i.e we had 4 columns which we could fill to represent the number). As with denary, larger numbers require more columns so in this next example we will use 8 bits.

Starting with the number 215.


 * $$215 - 128 = 87$$
 * $$87 - 64 = 23$$
 * $$23 - 16 = 7$$
 * $$7 - 4 = 3$$
 * $$3 - 2 = 1$$

Therefore 215 can be written as $$128 + 64 + 16 + 4 + 2 + 1 = 215$$.

Representing Negative Integers in Binary
Representing negative numbers in binary is a more complicated process as we need a method of knowing whether a number is negative or not while only being able to use 1 and 0 without confusing them with the number itself. There are two main methods that are used to represent these numbers: Sign and Magnitude and Twos Complement

Sign and Magnitude
This is the simplest method to represent negative numbers in binary and just involves using the first bit of the number to represent whether the number is negative. If the number is negative then it is a one and if not it is a 0.

For example

However, this also gives us some problems when doing arithmetic with the numbers as if we add 67 and -67, we should get 0. However as shown below this isn't the case.

Which equals 6 with a lost bit due to overflow. This means that calculations cannot be done automatically with sign and magnitude and that numbers must be decoded before they are used.

Two's Complement
Two's complement is another method of representing negative numbers which aims to tackle this problem. The method for converting a number is a little more complex but it is much more powerful. If we want to represent the number -67, as above we abide by the following steps:

And that's all there is to it. That number now represents -67 in two's complement.
 * 1) Represent the number in regular binary with a number of empty bits before it
 * 2) NOT the number (convert all 1s to 0s and all 0s to 1s)
 * 3) Add one to the number

One of the main advantages of using Two's Complement is the ability to do math without conversion. If we take the example from above with 67 added to -67:

Which therefore equals 1 with a lost bit due to overflow. This also works for regular subtraction (add a positive number to a negative number to simulate subtraction). As a result this is used a lot more commonly and can be implemented more easily.

Representing Decimals in Binary
Representing decimals in binary is an even more complex than negative numbers as we need to somehow store where the decimal point is while still maintaining the number and the accuracy that we were given. The method that you need to know for this is the normalised floating point representation.

Normalised Floating Point
This method uses a standard format for all of the decimal numbers that are stored on a system. It uses a similar method to how we use standard form to represent a number (345.54 becomes $$3.4554\times10^2$$) by defining the decimal point to be between the first and second bit and storing how much the number has shifted. The number itself is called the mantissa and has a fixed number of bits and the amount shifted is called the exponent and similarly has a fixed size.

One thing that you need to be careful of when using this method is that you are very clear in how many bits are used to store each bit of the number. This becomes very important when dealing either with large numbers or numbers with lots of decimal digits as you will begin to run into errors where we loose accuracy (floating point errors).

To represent a decimal number normally, we use the columns that we have used previously, but to the right of the 1 column, we add a decimal point and continue the columns using the form $$2^{-n}$$.

From here we can just represent our number. For this example we will be representing the number 47.625. The integer portion can be represented normally as $$32+8+4+2+1$$ and the decimal portion can be represented as $$\frac{1}{2} + \frac{1}{8}$$.

So now we have our number but we would never be able to store it on a device as we have a decimal point in the middle. This is where the idea of the floating point comes in. We need to find the position in the binary where the first change of digits happen. This means the first point where a 0 turns to a 1 or vice-versa. We then place the decimal point between these two values and record how many times we moved the decimal point.

We place it between the 64 and the 32 as this is the first change from a 0 to 1 that happens in the number, this is now in normalised form. Then we have moved it six places to the left. In this example we will be using a 10 bit mantissa and a 6 bit exponent (to be stored in a 16 bit register). When using the normalised form, we only take one digit from before the imaginary decimal point and then just work with 0s at the right end of the number. That means that the number we represented there becomes  when using 10 bits. We then shifted it by 6 to the left which means a positive move of 6. This is  in 6 bit binary.

Then we can just append the numbers to each other:

This successfully represents the decimal number 47.625 in normalised floating point. This method can also be used in conjunction with twos complement in order to represent negative decimals and it can be used to represent very small decimals like 0.00005, however, you must remember that any shifts to the right are negative and shifts to the left are positive.

Adding Numbers in Binary
Just like for denary additions, we align the columns for the two numbers and add the values starting from the right.


 * 1 add 1 is 2 (so zero 1s and one 2s) therefore 1s column is 0 and we carry 1 to the 2s column.
 * 1 add 0 add 1 is 2 so set as 0 and carry again
 * 1 add 1 add 1 is 3 so set as 1 and carry
 * 0 add 0 add 1 is 1
 * 1 add 1 is 2 so set 0, carry 1
 * 0 add 0 add 1 is 1
 * 1 add 0 is 1
 * 1 add 0 is 1

The above calculation process can be seen much more clearly in the table:

$$128 + 64 + 32 + 8 + 4 = 236$$

Check result in denary: $$215 + 21 = 236$$

Representing Numbers in Hexadecimal
Hexadecimal is one other form of data representation that is used rather frequently within computing. It is used primarily as it can hold a much large number in a smaller number of digits than binary or denary (although it will always be stored as binary within the computers memory).

Hexadecimal is the name for Base 16 which means we use 16 characters to represent numbers. As we only have 10 numbers, it branches out to use the characters,  ,  ,  ,   and   which represent 10, 11, 12, 13, 14 and 15 respectively.

Denary to Hexadecimal
When representing denary numbers in base 16, we use the exact same method as we do for binary; however, instead of writing the columns as powers of 2, we must write them as powers of 16. For example, to represent 7652 in hexadecimal:


 * 4096 ($$16^3$$) goes into 7652 once meaning we put a  in that column, we now have 3556 left
 * 256 ($$16^2$$) goes into 3556 13 times meaning we place a   in that column, we now have 228 left
 * 16 ($$16^1$$) goes into 228 14 times meaning we place an  in that column, we now have 4 left
 * 1 ($$16^0$$) goes into 4 4 times meaning we place a  in that column and we have 0 left.

So therefore  is   in hexadecimal.

Binary to Hexadecimal
Converting from binary directly into hexadecimal is actually a simple process that does not require converting the whole thing back into denary at once. Instead, when converting this way, we simply convert the binary to denary then to hexadecimal one nibble (4 bits) at a time. If we have the number 156 in binary, we can place it into our table and write up the correct headers as shown:

Therefore, this number is  in hexadecimal.

Bitwise Manipulations
Bitwise manipulations are specific group of changes that can be made to the bits of a binary number. There are a few that you need to be aware of as part of the specification which are: shifts, ANDs, ORs and XORs. This is a very quick and easy topic once you understand it.

Shifts
Binary registers can be shifted either to the left or the right and, as the name would suggest, it just moves the binary digits in one direction. Any spaces created will then just be filled with a 0 in the majority of cases.

Left Shift
A left shift moves the contents of the register to the left and discards any bits that no longer fit into the register. This function is important as it is equivalent to multiplying a number by two. In this example we will be using the value of 54: $$54_{10} = 00110110_{2}$$

Therefore if we left shift (often represented by the character ) while maintaining a maximum size of 8 bits:

As you can see in the last example, when we shifted by 3, the first bit was 'pushed out' of the register and lost. This is an important consideration when using left shifts as we must be aware of when we are going to lose a potentially important bit. In the first two examples, you can see that each left shift was equal to multiplying the original denary number by a power of two. Left shifting by one multiplied by two and left shifting by two multiplied by four. If we were to use a bigger register then left shifting by 3 would have given us the original value multiplied by 8.

This function is often used when multiplying a number as it can be broken down into multiplying by a power of three and then adding the original number a number of times. For example, multiplying a number by five would be equal to left shifting by two (multiplying by 4) and then adding the original number again.

Right Shift
Right shifts, unsurprisingly, are the opposite of left shifting and instead move the contents of the registers to the right. This section will not go into as much detail as you should already understand the basics of how shifting works and how it can be used.

If we right shift (often represented by the character ) while maintaining a maximum size of 8 bits:

From the second example, the impact of underflow errors is immediately apparent.

AND
You may not have covered an AND gate fully as this shows up in unit 1.4.3, however you should be able to guess. An AND gate returns a 1 values if the two inputs are both 1.

AND can be used as a bitwise operator in order to AND two binary registers or binary values together. For example, if we have the two registers  and   we can do:

One application of this is to determine whether a number is divisible by 2. If we AND any value with the binary value, it will leave us with only the least significant bit (the last bit). If this number is then greater than 0, it is odd, if not it is even.

OR
OR gates return a 1 value if one of the two inputs is a 1. This can be used as a bitwise operator to OR two values together. To the use the example from above:

One application of this is to store a range of true or false values within a binary number such as for a very simple control program. If the user wanted to enable a system if it is not already enabled, they could OR the contents of the register storing the state with a binary number containing a 1 at the position corresponding to a setting. The resulting value would contain the original settings with the required bit set to 1 if it was not already.

XOR
XOR gates are a modified version of an OR gate which return a 1 if one of the two inputs is 1 but will return a 0 if they both are. To use the example from above

This can be applied again in a control system to toggle settings as if that bit is currently off in the register, it would be enabled, but if it was on it would be switched to off.

Representing Text
Representing is a relatively simple process which uses numbers to represent a symbol. While the process itself is simple, the implementations can actually be more complex. For multiple machines to be able to communicate text, they must have a standard which defines which numbers are used for which characters. These standards are called Character Sets and vary quite in how the characters are defined and what sizes are available.

There are a wide, wide range of character sets that are implemented in different places and systems can even use more than one as long as they are aware of which one is currently in use. You can see a list of common character sets here. However, you just need to be aware of the existence of two: ASCII and Unicode.

ASCII


ASCII stands for the American Standard Code for Information Interchange. It was defined in 1963 and was one of the most common encodings. It uses 7 bits to represent characters by default which allows for a maximum of 128 characters to be represented. There are also extensions on this standard such as extended ASCII which uses 8 bit to represent characters which raises the possible options. A table showing all the ASCII codes along with their corresponding denary values is shown to the side. When text is encoded using this and stored, each of the denary numbers is represented as binary and stored.

Unicode


Unicode functions in the same way as ASCII, but it varies in the number of bits it uses to store the characters. There are multiple subsets of Unicode which have varying amounts of characters such as,  , and. The most recent standard of Unicode (at the time of writing) has 128,237 possible characters. It has become the new standard for systems and is used when building new systems in order to accommodate a much wider range of characters and languages.

An image showing a fraction of the possible Unicode symbols is shown to the side. You can read more about Unicode on Wikipedia on the Unicode page.