A-level Computing/AQA/Paper 2/Fundamentals of data representation/Rounding errors





Know and be able to explain why both fixed point and floating point representation of decimal numbers may be inaccurate.

For a real number to be represented exactly by the binary number system, it must be capable of being represented by a binary fraction in the given number of bits. Some values cannot ever be represented exactly, for example 0.110.

Maths in a processor is normally performed using set numbers of bits. For example, where you add 8 bits to 8 bits. This will often cause no problems at all:

00110011 (51) +00001010 (10) 00111101 (61)

But what happens if we add the following numbers together: 01110011 (115) +01001010 (74) 10111101 (189)

This may appear to have gone ok, but we have a problem. If we are dealing with twos complement numbers the answer from adding two positive numbers together is negative!

01110011 (115) +01001010 (74)  1 0111101 (-67!)

Overflow
Let's take a look at another problem example, the problem of overflow

1010 (-6)   +1010 (-6)  (1) 0 100 (+4!) As you can see in the sum above, we have added two negative numbers together and the result is a positive number.

To deal with the situations mentioned above we use the status register

The most common flags
For the sum that we met earlier we will take a look at how the status register can be used to stop the incorrect answer arising: 01110011 (115) +01001010 (74) 10111101 (-67) Status register: Z = False | C = False | N = True | O = True | P = Even

Using these flags you can see that the result is negative, if the original sum used only positive values, then we know we have an error.

Looking at the other equation: 1010 (-6)   +1010 (-6)  (1) 0 100 Status register: Z = False | C = True | N = False | O = True | P = Odd

Using these flags you can see that the result is positive when the original used two negative numbers. We can also see that overflow occurred.

What is the problem with the result of the following 4 bit sum:

1011 (-5)   +1011 (-5)

The result would create overflow, giving an incorrect answer: 1011 (-5)   +1011 (-5)  (1)0110 (+6)

Show the Status register for the following sum:

1001 (-7)   +1001 (-7)  (1)0010 (+2)

Status register: Z = False | C = True | N = False | O = True | P = Odd