A-level Computing 2009/AQA/Problem Solving, Programming, Operating Systems, Databases and Networking/Real Numbers/Floating point numbers





If you study other subjects such as Physics or Chemistry, you may come across Floating Point numbers like this $$6.63_{10} \times 10^{-34}$$ (Planck's constant) The first bit defines the non-zero part of the number and is called the Mantissa, the second part defines how many positions we want to move the decimal point, this is known as the Exponent and can be positive when moving the decimal point to the right and negative when moving to the left. $$\begin{matrix} \underbrace{6.63} \\ Mantissa \end{matrix} \times \begin{matrix} \underbrace{10^{-34}} \\ Exponent \end{matrix}$$

If you wanted to write out that number in full you would have to move the decimal point in the exponent 34 places to the left, resulting in: $$0.00\;000\;000\;000\;000\;000\;000\;000\;000\;000\;000\;663$$ Which would take a lot of time to write and is very hard for the human eye to see how many zeros there are. Therefore, when we can accept a certain level of accuracy (6.63 = 3 significant figures), we can store a many digit number like planks constant in a small number of digits. You are always weighing up the scope (or range) of the number against its accuracy (number of significant bits).

The same is true with binary numbers and is even more important. When you are dealing with numbers and their computational representation you must always be aware of how much space the numbers will take up in memory. As we saw with the above example, the non floating point representation of a number can take up an unfeasible number of digits, imagine how many digits you would need to store $$0.00\;000\;000\;000\;000\;000\;000\;000\;000\;000\;000\;663$$ in binary‽

A binary floating point number may consist of 2, 3 or 4 bytes, however the only ones you need to worry about are the 2 byte (16 bit) variety. The first 10 bits are the Mantissa, the last 6 bits are the exponent.

$$\underbrace{\overbrace{0}^\text{sign bit} \cdot 101010101}_\text{mantissa} \times \underbrace{010101}_\text{exponent}$$

Just like the denary floating point representation, a binary floating point number will have a mantissa and an exponent, though as you are dealing with binary (base 2) you must remember that that instead of having $$\times 10^{xy}$$ you will have to use $$\times 2^{xy}$$.

Why use binary floating point numbers
Fixed point binary allows a computer to hold fractions but due to its nature is very limited in its scope. Even using 4 bytes to hold each number, with 8 bits for the fractional part after the point, the largest number that can be held is just over 8 million. Another format is needed for holding very large numbers.

In decimal, very large numbers can be shown with a mantissa and an exponent. i.e. 0.12*10² Here the 0.12 is the mantissa and the 10² is the exponent. the mantissa holds the main digits and the exponents defines where the decimal point should be placed.

The same technique can be used for binary numbers. For example two bytes could be split so that 10 bits are used for the mantissa and the remaining 6 for the exponent. This allows a much greater scope of numbers to be used.

Converting binary floating point to decimal
There are several stages to take when working out a floating point number in binary. In fact it is much like a disco dance routine - known on this page as the Noorgat Dance, Kemp variation (you wont be tested on name but it should help you to remember)


 * 1) Sign - find the sign of the mantissa (make a note of this)
 * 2) Slide - find the value of the exponent and whether it is positive or negative
 * 3) Bounce - move the decimal the distance the exponent asks, left for a negative exponent, right for a positive
 * 4) If Moving Left and Is Positive Number, Then pad with zeroes
 * 5) If Moving Left and Is Negative Number, Then pad with ones
 * 6) Flip - If the mantissa is negative perform twos complement on it
 * 7) Swim - starting at the decimal point work out the values of the mantissa, going left, then right. Now make sure you refer back to the sign you recorded on the sign move.

Lets try it out. We are given the following 16 bit floating point number, with 10 bits for the mantissa, and 6 bits for the exponent. Remember the decimal point is between the first and second most significant bits $$0100\;0000\;0000\;0001$$ $$0.100\;0000\;00\;\mid\;00\;0001$$ The first action we need to perform is the sign, find out the sign of the mantissa $$\underline{0}.100\;0000\;00\;\mid\;00\;0001$$ It is 0 so the mantissa is positive The second step in the Noorgat dance is the slide, we need to find the value of the exponent, that is the last 6 bits of the number $$0.100\;0000\;00\;\mid\;\underline{00\;0001}$$

$$\begin{matrix} 32 & 16 & 8 & 4 & 2 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{matrix} = 1$$ So we know that the exponent is of size positive one and we will have to move the decimal point one place to the right. The third step in the Noorgat dance is the bounce that is moving the decimal point of the Mantissa the number of positions specified by the slide, which was one position to the right. Like so: $$0.\vec 1 00\;0000\;00\;\mid\;00\;0001$$ $$01.00\;0000\;00\;\mid\;00\;0001$$ The fourth step is the optional flip. Check back to the sign stage and see if the Mantissa is negative. It isn't? Oh well you can skip past this stage then as we only flip the number if the mantissa is negative.

The fifth and final step is the swim. Taking the mantissa on its own we can now work out the value of the floating point number. Start at the centre and label each number to the left $$1, 2, 4, 8, 16$$ and so on. The each number on the right $$\frac{1}{2},\frac{1}{4},\frac{1}{8},\frac{1}{16}$$ and so on. $$\begin{matrix} 4 & 2 & 1 &. & \frac{1}{2} & \frac{1}{4} & \frac{1}{8} \\ 0 & 0 & 1 &. & 0 & 0 & 0 \end{matrix} = 1 = 0.100\;0000\;00\;\mid\;00\;0001$$ Voila! the answer is 1

1. Sign: the mantissa starts with a zero, therefore it is a positive number.

2. Slide: work out the value of the exponent 000110 = +6 3. Bounce: we need to move the decimal point in the mantissa. In this case the exponent was positive so we need to move the decimal point 6 places to the right 0.001101000 -> 0001101.000 4. Flip: as the number isn't negative we don't need to do this

5. Swim: work out the value on the left hand side and right hand side of the decimal point 1+4+8 = +13 FINISHED!

1. Sign: the mantissa starts with a zero, therefore it is a positive number.

2. Slide: work out the value of the exponent 111111 It starts with a one therefore it is a negative number 000001 = -1 3. Bounce: we need to move the decimal point in the mantissa. In this case the exponent was negative so we need to move the decimal point 1 place to the left 0.101000000 -> 0.0101000000 4. Flip: as the mantissa number isn't negative we don't need to do this

5. Swim: work out the value on the left hand side and right hand side of the decimal point 1/4 + 1/16 = +0.3125 FINISHED!

1. Sign: the mantissa starts with a one, therefore it is a negative number.

2. Slide: work out the value of the exponent 000101 = +5 3. Bounce: we need to move the decimal point in the mantissa. In this case the exponent was positive so we need to move the decimal point 5 places to the right 1.011111010 -> 101111.1010 4. Flip: the mantissa is negative as noted in step one so we need to convert this number 101111.1010 -> 010000.0110 5. Swim: work out the value on the left hand side and right hand side of the decimal point 16+1/4+1/8 = -16.375 FINISHED!

1. Sign: the mantissa starts with a one, therefore it is a negative number.

2. Slide: work out the value of the exponent 111101 It starts with a one therefore it is a negative number 000011 = -3 3. Bounce: we need to move the decimal point in the mantissa. In this case the exponent was negative so we need to move the decimal point 3 places to the left. Watch carefully! 1.101000000 -> 1.111101000000 note that we placed extra ones on the front of the number. Consider the exponent being negative and the mantissa positive, we would add extra zeros on the front 0.01 * 2^-3 = 0.00001 If both are negative placing zeros in front of the mantissa would make it positive! Therefore we need to add extra ones to keep the mantissa negative With the flip we'll lose these 'extra' ones 4. Flip: the mantissa is negative as noted in step one so we need to convert this number 1.111101000000 -> 0.000011000000 5. Swim: work out the value on the left hand side and right hand side of the decimal point 1/32+1/64 = -0.046875 Remember the number was negative! FINISHED!

1. Sign: the mantissa starts with a one, therefore it is a negative number.

2. Slide: work out the value of the exponent 000011 = +3 3. Bounce: we need to move the decimal point in the mantissa. In this case the exponent was positive so we need to move the decimal point 3 places to the right. 1.111111010 -> 1111.111010 4. Flip: the mantissa is negative as noted in step one so we need to convert this number 1111.1110100 -> 0000.000110 5. Swim: work out the value on the left hand side and right hand side of the decimal point 1/16+1/32 = -0.09375 Remember the number was negative! FINISHED!

Converting denary into binary floating point
You might also be asked to convert a denary number into its binary floating point equivalent.


 * 1) work out the binary equivalent
 * 2) work out how far to move the binary point (y)
 * 3) set the exponent to be reverse of the number of places you moved the binary point (-y)
 * 4) pad the number with extra bits

If we are asked to convert the denary number 39.75 into binary floating point we first need to find out the binary equivalent: 128 64 32 16 8  4  2  1 . ½  ¼  ⅛   0  0  1  0  0  1  1  1 . 1  1  0 How far do we need to move the binary point to the left so that the number is normalised? 0 0 . 1  0  0  1  1  1  1  1  0  (6 places to the left) So to get our decimal point back to where it started, we need to move 6 places to the right. 6 now becomes your exponent. 0.100111110 | 000110 If you want to check your answer, convert the number above into decimal. You get 39.75!

128 64 32 16 8  4  2  1 . ½  ¼  ⅛   0  1  0  0  0  0  1  1 . 0  0  0 How far do we need to move the binary point to the left so that the number is normlised? 0 . 1 0  0  0  0  1  1  0  0  0  (7 places to the left) To get the front to be normalised we must move the decimal point 7 places. (moving it 6 places would have made the number negative!) 0.100001100 | 000111

128 64 32 16 8  4  2  1 . ½  ¼  ⅛   0  0  0  1  0  1  1  1 . 0  1  0 How far do we need to move the binary point to the left so that the number is normalised? 0 0  0 . 1  0  1  1  1  0  1  0   (5 places to the left) To get the front to be normalised we must move the decimal point 5 places. (moving it 4 places would have made the number negative!) 0.101110100 | 000101

128 64 32 16 8  4  2  1 . ½  ¼  ⅛   0  1  1  1  1  0  1  1 . 1  1  1 How far do we need to move the binary point to the left so that the number is normlised? 0 . 1 1  1  1  0  1  1  1  1  1   (7 places to the left) To get the front to be normalised we must move the decimal point 7 places. 0.1111011111 | 000111 But this is using 11 bits for the mantissa, we have to drop one, losing accuracy! 0.111101111 | 000111

128 64 32 16 8  4  2  1 . ½  ¼  ⅛   1  0  0  0  0  0  0  0 . 0  1  0 How far do we need to move the binary point to the left so that the number is normlised? 0.1 0  0  0  0  0  0  0  0  1  0   (8 places to the left) To get the front to be normalised we must move the decimal point 8 places. (moving it 7 plaaces would have made it negative!) 0.100000000 | 001000 Notice that we have had to drop the .25, as this would not have fitted into 10 bits for the mantissa.

1024 512 256 128 64 32 16 8  4  2  1 . ½  ¼  ⅛    0   1   0   0  0  0  0  0  0  0  1 . 0  0  0 Convert this into its negative form using the flipping rule: 1024 512 256 128 64 32 16 8  4  2  1 . ½  ¼  ⅛    1   0   1   1  1  1  1  1  1  1  1 . 0  0  0 How far do we need to move the binary point to the left so that the number is normlised? 1 . 0 1  1  1  1  1  1  1  1  1  0  0  0   (10 places to the left) To get the front to be normalised we must move the decimal point 10 places. 1.011111111 | 001010 Notice that we have had to drop the last one as this would not have fitted into 10 bits for the mantissa. This means that the number shown is only: 10111111110.0 converting this into denary: 01000000010.0 = -514 You'll look at errors using floating point numbers very soon

For when you have a 16bit number where the mantissa is 10bits and the exponent is 6 bits:

the largest positive number will be: Mantissa: 0.111111111 Exponent: 011111 the smallest positive number will be: Mantissa: 0.000000001 Exponent: 100000 the largest negative number will be: Mantissa: 1.000000000 Exponent: 011111 the smallest negative number will be: Mantissa: 1.111111111 Exponent: 100000