A-level Computing/CIE/Advanced Theory/Data representation

User-defined data types File organisation and access Real numbers and normalised floating-point representation
 * show understanding of why user-defined types are necessary
 * define and use non-composite types: enumerated, pointer
 * define and use composite data types: set, record and class/object
 * choose and design an appropriate user-defined data type for a given problem
 * show understanding of methods of file organisation: serial, sequential (using a key field) and random (using a record key)
 * show understanding of methods of file access:
 * sequential access for serial and sequential files
 * direct access for sequential and random files.
 * select an appropriate method of file organisation and file access for a given problem
 * describe the format of binary floating-point real numbers
 * convert binary floating-point real numbers into denary and vice versa
 * normalise floating-point numbers
 * show understanding of the reasons for normalisation
 * show understanding of the effects of changing the allocation of bits to mantissa and exponent in a floating-point representation
 * show understanding of how underflow and overflow can occur
 * show understanding of why computers cannot represent mathematical real numbers such as √2 or π, only approximations
 * show understanding of the consequences of a binary representation only being an approximation to the real number it represents
 * show understanding that binary representations can give rise to rounding errors

User-defined Data types
A user-defined data type is a data type which the programmer has designed for use within a program, as opposed to a built-in data type. A non-composite type is defined without reference to another data type, whereas a composite data type is built from other data types.

Enumerated Data type
An enumerated data type is a non-composite data type defined from an ordered list of values. Variables can be declared by this data type, and assigned one of the values in the list

e.g. The following pseudocode declares an enumerated data type for months,

TYPE TMonth = (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec) // The TMonth data type is declared

DECLARE ThisMonth : TMonth // A variable is declared with the TMonth enumerated data type.

ThisMonth ← May // The variable ThisMonth

ThisMonth > Feb // This is True, as May is later in the list than Feb

Note: Values in an enumerated data type are not string values; do not enclose them with quotation marks.

Pointer Data type
A pointer data type references memory locations. Thus, it has to relate to the type of data that it is pointing to.

e.g. This pseudocode declares a pointer data type, a pointer, and uses the pointer.

Pointers are typically used to construct complex data structures, such as linked lists and binary trees. These data structures are discussed later on.

Record Data type
A record data type stores a collection of information regarding a common subject, similar to a record in a database. It is constructed from several fields, which each have their own data types; thus, the record data type is a composite data type.

e.g. The following pseudocode defines a record type for a student record:

TYPE TStudentRecord                   // The record consists of several fields DECLARE FirstName : STRING     // Each field has its own data type DECLARE LastName : STRING DECLARE Absences : INTEGER DECLARE Class : STRING ENDTYPE

DECLARE Student1 : TStudentRecord // The variable is declared as a record

Student1.FirstName ← "John"      // Fields can be accessed using dot notation Student1.LastName ← "Doe" Student1.Absences ← 0 Student1.Class ← "A2 Level"

Set Data type
A set data type is a composite data type that allows a programmer to apply set theory operations to data in a program.

These operations typically include:


 * Union
 * Difference
 * Intersection
 * Including an element
 * Excluding an element
 * Checking whether an element is in a set

Object Data type
An object data type is a composite data type used in object-oriented programming to define classes.

Essentially, objects are just records with functions that act on the data that they contain.

A data object is a region of storage that contains a value or group of values. Each value can be accessed using its identifier or a more complex expression that refers to the object. In addition, each object has a unique data type.

File Organisation
A file is a collection of records. Each record is a collection of fields. Every field consists of a value.

Serial Files
A serial file is a collection of records with no defined order. Records enter the file in chronological order. All records have a defined format so that they can be input and output correctly.

A text file can be considered an example of a serial file: a series of characters are input, in chronological order, to produce a file.

A common use of serial files is for real-time processing. Records can be entered in real time, as quickly as possible, because they do not need to be sorted. This makes serial files efficient.

Sequential Files
A sequential file stores records in order of a key field. In order for it to be possible to sort records by key field, this field needs to be unique and sequential but does not need to be consecutive.

In a sequential file, a particular record can be found by reading all of the key fields until you reach the one you are looking for.

Direct-Access Files
A direct-access file is a collection of records that can be directly accessed without having to check every record. This is acheived using a hash table.

A hash table is a table of data which is ordered not by the key field, but by the hash value of the key field. Thus, data can be directly accessed by hashing the key field, rather than having to look through each record one by one.

Accessing Files
There are two ways to access a specific record within a file: sequential access and direct access. Serial and sequential files can be accessed using sequential access and direct-access files can be accessed using direct access.

Sequential access is where each record in the file is read, one by one, until the desired record is found.

Direct access is where a hashing algorithm is used to jump to a specific record in the file.

Deleting & Editing Data
In a sequentially accessed file, deleting and editing data requires the creation of a new file. Data is moved from the old file to the new file until the part where the record needs editing is reached.

However, in a direct-access file, data can be deleted or edited in place: there is no need for a new file.

Floating-point Numbers
Floating-point notation is a way of representing very small or very large numbers with the same amount of bits. It is similar to scientific notation.

A floating-point number consists of three parts: the sign bit, the mantissa, and the exponent. To find the value of the number, we use $$\pm M\times2^E$$ where ± is determined by the sign bit, M is the mantissa, and E is the exponent.



The Mantissa-Exponent Tradeoff
When making a floating-point format, the designer must choose how many bits to allocate to the mantissa and to the exponent. If more bits are allocated to the mantissa, the floating-point value is more precise; whereas if more bits were allocated to the exponent, the floating-point system could represent a greater range of values.

Normalisation
Normalisation is the process of choosing the floating-point representation of a number such that every number that can be represented in the floating-point system has one and only one valid representation.

Without normalisation, there could be multiple valid representations for the same number. For example, the number 2.0 can be represented as,  , or. Thus, we need a standard way of referring to a given number.

This is where normalisation comes in: normalisation means that the only correct form of the number is where the sign bit and mantissa most significant bit are different. So for 2.0,  would be the valid representation.

Underflow
Underflow is where the number is too small to be represented using the floating-point system.

e.g. In a system with 8 bits for the mantissa and 4 bits for the exponent, the lowest possible exponent is, or -8 in denary. If the system is normalised, the smallest positive mantissa value is. Thus, the smallest positive number in this system is, which is equal to 1/512. If a calculation in this system resulted in a number which was lower than 1/512, there would be an underflow error, because the number is too small to be stored.

Overflow
Overflow is similar to underflow, but it occurs when a number is too large to be stored in the system.

e.g. In a system with an 8-bit mantissa and a 4-bit exponent, the largest possible number that can be represented is, which is equal to 127. If a calculation produced a number higher than 127, there would be an overflow error and the number could not be stored.

Overflow and underflow can both occur with negative values that are too large or too small.

Rounding Errors
A rounding error is where a number cannot be represented exactly, and needs to be approximated.

e.g. The number 1/3 can only be represented in binary using recurring bits (0.01 01 ). The floating-point format does not allow for recurring bits as there is only a finite amount of memory in the system. Thus, it needs to be rounded, so it will be represented as.