Programming Language Concepts Using C and C++/Data Level Structure

In this chapter, we will start with defining properties common to all data items usable in a programming context and then move on to classifying data according to their structure and type. While doing so, we will also try to give an idea of how they can be laid out in memory.

Immutable Data
Constant is a data item that remains unchanged throughout its lifetime. A constant may be used literally or may be named. Named constants are sometimes termed symbolic constants or figurative constants.

Some programming languages make a distinction between constants whose values are determined at compile-time and those whose values are determined at run-time. In C#, for example, the former is tagged with the keyword  while the latter with. In Java, constancy of a field (or local identifier) is flagged with the  keyword and presence or absence of the   modifier classifies the associated data item to be a compile-time or run-time constant, respectively.

Here the values for  and   (Euler constant) can be determined even before the program starts to run. So they are defined to be const.

Note that same definitions are valid for all instances, if there is any, of the  class. That is, the value of  or   does not change from one instance to another. As a matter of fact, they exist independently of the instances as class fields. In other words, they are [implicitly] static. Explicit use of  together with   in C++, which is a source of inspiration for C#, is a manifestation of this.

Here, value for  cannot be determined before the related   object is created. Once the object is created, its  field never changes during the lifetime of the object. So, we conclude that the only possible place for the initialization is the constructor. This value can be either supplied as an argument to the constructor or computed using other attributes of the object and/or class.

Observe that values of such a field may change from one instance to another. In our example this is the expected behavior: any two  objects must have different   values.

Mutable Data
A variable provides us with named memory storage that we can write to, retrieve, and manipulate throughout the course of our program.

There are two values associated with a variable:


 * 1) Its data value, stored at some memory address. This is sometimes referred to as an object’s rvalue (pronounced "are-value"). You might think of it as meaning read (or right) value. Both a constant and a variable can serve as an rvalue.
 * 2) Its address value–that is, its address in memory within which its data value is stored. This is sometimes referred to as an object’s lvalue (pronounced "ell-value"). You might think of it as meaning location (or left) value. Note that a constant cannot serve as an lvalue. (Because it cannot appear on the left-hand side of an assignment.)

Values may be provided for variables in one of three ways:


 * 1) By initialization at the time of creation. This can be user-provided or a default value assigned by the language. An example to the second method is the Java programming language.
 * 2) By an input operation.
 * 3) By an assignment statement.

Act of relating an identifier–be that a variable, constant, or a function–with its lvalue is called binding and can be established in two different ways. In cases when the relation between the identifier and the lvalue is one-to-one, binding takes place before run-time and therefore is referred to as static binding. An example to this type of binding is that of a global variable where there is only one instance of the identifier. In other cases when the relation is one-to-many, binding takes place at run-time and is called as dynamic binding. An example is a local variable whose instances, zero or more, are allocated anew each time a function is called.

Visibility
We can also speak of scope of an identifier. Scope can be defined as the portion of the program over which the declaration of an identifier is effective. An identifier with an effective declaration is said to be in scope, while it would otherwise be referred to as out-of scope.

The scope rules of a language determine the treatment of references to non-local identifiers. A common rule used by most programming languages, called the lexical- or static-scope rule, determines the declaration that applies to an identifier by examining the program text alone. An alternative rule, called the dynamic-scope rule, determines the declaration applicable to an identifier at run time, by considering the current subprogram activations.

An identifier being in scope does not mean that it is visible. A declaration of an identifier is visible in some context if a use of the identifier in that context will be bound to the declaration. That is, the identifier will be associated with that declaration. A declaration might be visible throughout its scope, but it may also be hidden by other declarations whose scope and visibility overlap that of the first declaration.

In the above code fragment, scopes of variables (identifiers) are:


 * (of outer block), : from its declaration point to the end of the outer block
 * (of inner block), : from its declaration point to the end of the inner block

Visibilities of the variables are:


 * ,  (of inner block),  : same with their scope
 * (of outer block): its scope minus the inner block

Accessibility
A notion found in modular and object-oriented languages, accessibility constraints can be applied on data. This is done for two purposes:


 * 1) To preserve the consistency of the data and
 * 2) To give the implementer the freedom of changing the implementation details.

Declaring the implementation details in the above class definition as  marks them off-limits to the user, which means the below statements are not permitted.

Issuing the first statement would probably cause a nonempty  object to look empty. Similarly, the second statement would probably populate the  object by adding a new element to some location other than that indicated by , which is definitely against the definition of a stack.

Additionally, such a definition enables the implementer of the  class to change the implementation details. For instance, she may choose to use a  as the underlying data structure. Now that it is not known to the outside world, the users of the  class will not be affected by this decision.

Data Elements
The most basic data entity is the data element. These data entities may be grouped together to form structures.

Primitive Data Elements
Primitive data elements are those that can be directly operated on by machine-language instructions and are broadly divided into numeric, character, logical (boolean). Numeric data elements can further be divided into integers and reals.

Data Elements that Are Addresses
An address is a value that indicates a location in the process image created as a result of running the program. Depending on the program segment it points to, an address falls in either one of two groups.


 * 1) Label is actually the address of a program statement, an address in the code segment of a program. It may also be thought of as a data element that may be operated upon by a goto operation.
 * 2) Pointers reference or point to other elements in the code and data segments. In the former case, a pointer indicating the start of a subprogram can be used to invoke the subprogram dynamically, which enables implementation of polymorphic subprograms by means of callbacks. Pointers into the data segment often refer to unnamed data elements. They are heavily used in constructing dynamic data structures and recursive algorithms. Handles, which may be considered as intelligent pointers, serve the same purpose.

Compound Data Elements
Strings of characters, generally referred to as simply strings, are linear sequences of characters. They are sometimes considered as primitive data elements, because they are directly operated on by machine language instructions, and sometimes classified as data structures.

Data Structures
Data structures' are organized collections of data elements that are subject to certain allowable operations. Data structures are logical entities in the sense that they are created by programmers and are operated on by high-level programs. These may have little bearing to the physical entities, that is, the storage structures operated on by machine language code.

Data structures may be classified as:


 * Linear vs. nonlinear: A linear, as opposed to nonlinear, data structure is one in which the individual components are an ordered sequence. Examples to linear data structures are strings, arrays, and lists. Examples to nonlinear data structures include trees, graphs, and sets.
 * Static vs. dynamic: A static structure is one that has no capacity for change, specifically with regard to its size, during the course of execution. Arrays and records are examples to static data structures. An example to dynamic data structures is lists.

Storage (Memory) Structures
Storage structures are data structures after they have been mapped to memory. While the data structure is the logical organization of your data, the storage structure represents the way in which your data is physically stored in memory during the execution of your program.

Storage can be allocated in two ways:


 * Sequential: A structure allocated this way may also be called a static structure since it is incapable of change throughout its lifetime. Such structures use implicit ordering: components are ordered by virtue of their sequential ordering in the structure (indexing).
 * Linked: A structure allocated this way may also be called a dynamic structure since the data structure can grow and shrink during its lifetime. They use explicit ordering: each component contains within itself the address of the next item so that it is in effect "pointing" to its own successor.

As for the pros and cons of each method:


 * In static structures, full storage remains allocated during the entire lifetime of the structure. Additionally, in order to avoid overflow, maximum theoretical size is used in the declaration of the structure. For these reasons, static structures are not memory efficient. In dynamic structures, there is a space overhead due to the pointer field(s).
 * Due to the possibility of shifting, which can be avoided at the expense of some extra memory, insertions into and deletions from any position other than the end of a static structure are expensive. Such is not the case for dynamic structures.
 * Using the index value direct access is possible in static structures, which means accessing any item in the structure takes constant time. This, however, is not valid for dynamic structures. This weakness, however, can be alleviated by imposing a hierarchical structure on the data.

File Structures
File structures refer to data residing in secondary storage. When program execution terminates, file structures are the only structures to survive the termination of the program.

The data hierarchy refers to the logical organization of data that is probably stored on secondary or external storage media such as magnetic tape.

A file is a collection of related records, related to a particular application. A record is a collection of related data items, or fields, related to a single object of processing. A field is a data item, a piece of information (either a data element or a structured data item) that is contained within the record.

A file can be accessed for input, output, input/output, and append. It can be processed in two different modes: batch mode, query mode. In batch mode, the component records of the file are operated on in sequential order. Report generation for test results of a class is an example to this type of processing. In query mode individual records are manipulated by accessing them directly. Retrieving the record of one single student falls into this category.

Another important issue is the organization of the file on secondary storage. This (physical) ordering of records can be done in three ways:


 * Sequential: The file is seen as a linear sequence of records. Such files cannot be accessed in input/output access mode. Text files are typical examples of sequential files.
 * Relative: Each record in the file can be accessed directly by location. Naturally, such organization becomes possible only when the file is stored on a direct-access storage device (DASD). The mapping between the key field in the record and the location on disk can be done in two ways: direct-mapping and hashing.
 * Indexed sequential: This is a compromise between the first two methods. Files are stored sequentially on a DASD but there is also an index file that allows optimum direct access by way of a search on index.

Each file organization technique must be evaluated on the basis of:


 * Access time: The time it takes to find a particular data item.
 * Insertion time: The time it takes to insert a new data item. This includes the time it takes to find the correct place to insert the new data item as well as the time it takes to update the index structure.
 * Deletion time: The time it takes to delete a data item. This includes the time it takes to find the item to be deleted as well as the time it takes to update the index structure.
 * Space overhead: The additional space occupied by an index structure.

These in turn are affected by three factors. The system must first move the head to the appropriate track or cylinder. This head movement is called a seek and the time to complete is seek time. Once the head is at the right track, it must wait until the desired block rotates under the read-write head. This delay is called latency time. Finally, the actual transfer of data between the disk and main memory can take place. This last part is transfer time.

Typical operations on files are: open, close, read, write, EOF, and maintenance operations such as sorting, merging, updating, and backup.

Data Types
A data type, usually referred to as simply type, is composed of a domain of data elements and a set of operations that act on those elements–operations that can construct, destroy, or modify instances of those data elements.

In addition to built-in types, which can be structured as well as primitive data types, most languages have facilities for the definition of new data types by the user. [ALGOL68 and Pascal were the first programming languages to provide this.]

A type system is a facility for defining new types and for declaring variables to be of such types. A type system may also have the capability for type checking, which may be static or dynamic depending on whether this checking is done at compile time or during execution.

A strongly typed programming language is one in which the types of all variables are determined at compile time. Programs written in such a language, which is said to have static typing, must explicitly declare all programmer-defined words. Storage requirements for global and local variables are determined completely during compile time. A strongly typed programming language may include a typing facility for defining new types.

With dynamic typing, variables are not bound to type at compile time. Languages that allow for dynamic typing of variables, which are also classified as weakly typed, may utilize dynamic type checking, which requires the presence of run-time routines to check types and perform correct operations such as those involving storage mapping and code selection. One may also say that types are associated with values, not variables.

Equivalence
In an assignment statement, like many other programming constructs, a compiler not only checks the syntactic correctness but also tests the semantic validity. For instance, it would not be permitted to assign a string value to an integer variable; the compiler would enforce some sort of assignment-compatibility rule. An important ingredient of this process is finding out whether types of the expression and the variable are equivalent.

This can be done in two different ways: structural equivalence and name equivalence. In the former, two types are equivalent if they are made up of the same underlying data types.

In name equivalence, two types are equal only if their names are the same. This is the approach taken by most Algol-based languages.

Extension
A type is said to extend another if all of its instances can be seen as instances of the type it extends. The base type, the one being extended, is called a generalization of the extended type, while extended type is said to be a specialization of its base type.

Thanks to the nature of the relation, an expression of extended type is assignment-compatible with a variable of base-type.

The most popular technique used to provide this relation is inheritance common to most object-oriented programming languages.

Implementation
A type is said to implement another if it provides an implementation for all operations listed in the interface of the implemented type.

Implementation can be seen as a special case of extension, where the base type does not provide any implementation of its operations.

In many object-oriented programming languages, such a relation is found between an interface and its implementing class.

Declarations
In languages with static type checking, the program must somehow communicate the types of the identifiers it uses. Complete knowledge of identifier types at compile time leads to a more efficient program because of the following reasons.

More efficient allocation of storage. For instance, all integer types can be stored similarly using the largest such type. But, if you know the exact type, you don’t have to allocate the largest size. More efficient routines at run time. A + B is handled differently depending upon whether A and B are integers or real numbers Compile-time checks. Many invalid uses of the programming constructs are spotted before the program even starts to run.

On the other hand, at the expense of ensuring type safety compilers may at times reject valid programs.

Identifier types can be communicated in two ways.

Explicit Type Declarations
Widely preferred over the alternative, the programmer uses declarative statements to communicate the types of variables, functions, and so on. Some programming languages, such as ML and Haskell, do not require the programmer to provide type declarations for all identifiers. Through a process called type inference, compiler does its best to figure out the types of expressions.

Implicit Type Declarations
In (some versions of) languages like FORTRAN and BASIC, the way a variable is named reveals its type.

Scalar Data Types
A scalar data type has a domain composed only of individual primitive data elements.

Numeric Types
Numeric types are related to or represent quantities in the outside world. However, whereas in real life these types may have infinite domains, in the world of computers their domains are finite.

Logical (Boolean) Type
Variables of such a type can take on only two values,   or  , which may be represented in the machine as 0 and 1, zero and non-zero. Typical operations on booleans are,  ,.

Pointer Type
A pointer is a reference to an object or data element. A pointer variable is an identifier whose value is a reference to an object.

Pointers are important for the dynamic allocation of a previously undetermined amount of data. Additionally, they permit the same data to reside on different structures simultaneously. In other words, they make sharing of data possible.

Pointer variables point to and provide the means for accessing unnamed, or anonymous variables. Consequently, operations on pointers must distinguish between operations on the pointer variable itself and operations on the quantity to which the pointer is pointing.

in the above example is used to access a location holding an  value. As a matter of fact, it can hold a reference to s only. This is the most important difference between a pointer and a plain address: while an address can be used to refer to values of any type, a pointer holds a reference to a specific data type. However, a precious tool in implementing generic collections in C, type agnosticism of addresses can be reclaimed by disciplined use of.

In C, where pointers are heavily used, advantages may turn into maintenance nightmares. Following program is an example to this.

When we compile and run this program, it will produce the following output.

p2ci: 65   p2i: 65 p2ci: 66   p2i: 66
 * }

This is a violation of the contract we have made. On line 6, we guarantee that the content of the location pointed to by  will not change. On line 8, we assign  to , which is another pointer that lets itself be updated. We later go on to change the value found at the location through the non-constant pointer, which means the value pointed to by   is also changed.

The compiler may issue a warning for this error, which is the case in the GNU C compiler. But you cannot rely on this: not all compilers will issue a warning. Sometimes, the programmer will turn off the warnings to avoid reading annoying messages and the message will go unnoticed.

Structured Data Types
A structured data type has domain elements that are themselves composed of other scalar or structured type elements.

Strings
Strings are ordered set of data elements of dynamically changing size.

Two attributes can be associated with a string: type and length. Type refers to the domain of the individual elements while length refers to the number of elements. (Note that length and size are two different things.)

The type attribute is generally character although bit strings are also commonly used for implementing sets.

Another possible representation of the same string in a different language would be:



Note that number of bytes reserved for the length prefix can change from implementation to implementation. Another point to keep in mind: ASCII is not without competition. Alternatives include Unicode and ISO/IEC-8859 series of ASCII-based encodings. In the case of Unicode, each character takes up two bytes in memory.

As hinted by the above figure,  is used for exchanging character string data between components written possibly using different languages. As a matter of fact the length prefix is inherited from Visual Basic while the terminating null character is taken from C.

Note that length prefix holds the number of bytes the character string proper occupies and the  identifier is actually a pointer to the first character.

Typical operations on strings are:


 * Concatenation creates a string from smaller strings.
 * Substring creates a string from a subsequence of another string.
 * Index tests for containment of a smaller string in a larger string. It returns the index value where the subsequence containing the smaller string starts.
 * Length returns the number of components in the string.
 * Insert, delete, replace, ...

Arrays
An array may be defined as a fixed-size, ordered set of homogeneous data. It is stored in contiguous memory locations and this makes direct access possible. Its fixed-size makes an array a static structure.

In some programming languages such as Standard Pascal, size must be known at compile time while in many this size can also be given at run time. But, even in the latter case, size of the array does not change during its lifetime.

In the above Oberon fragment, any actual (one dimensional) array parameter with element type  is compatible with. The subprogram can be called with a one-dimensional array of any size. But, once the array is passed as an argument its size cannot change.

Among the important attributes of an array are component type, dimensionality of the array, and size in each dimension.

Arrays can be represented in the memory in two different ways:


 * 1) Row-major (Almost all major programming languages)
 * 2) Column-major (FORTRAN)

In order to minimize the access time to individual array components, we should make sure that fastest changing indexes in our programs (that is, the innermost loop variables) correspond to the fastest changing indexes in the memory layout. If we ignore the loop ordering, with large multidimensional arrays, the virtual memory performance may suffer badly.

This program fragment shows the accurate loop order for a language that represents arrays using row-major representation. Note that the innermost loop variable (the fastest changing one) in the fragment corresponds to the fastest changing index in the memory layout. This correspondence must be extended to other loop variables and indexes, i.e. the second innermost loop variable must correspond to the second fastest changing index in the memory and ... .

This second fragment gives the correct loop order for a language with column-major representation.



It should be noted that header information included in the array representation is not standard and may vary or even disappear in some programming languages. Most notable example to this is Java, in which size of the array is not used in type checking. This means one can use an array handle to manipulate arrays of different sizes.

This may at first seem to contradict the static nature of arrays. After all, size of  has been changed from 10 to 20. Not really! What we have done in the previous code fragment is to make  indicate two different array objects in the heap. In other words, we have changed the array handle, not the array object itself.

Multidimensional Arrays
Support for multidimensional arrays can be provided in two ways: jagged arrays&mdash;sometimes referred to as ragged arrays&mdash;and rectangular arrays.

What we have here is actually an array of arrays. Since different arrays can possibly have differing component counts, sub-arrays in our example can and do have different lengths. Same array can be formed using the following code, which reflects the way arrays are treated in Java.




 * }

Observe there are four array allocations involved:,  ,  , and. These allocations need not be done in a specific order and can be interspersed in the code. The only restriction is that sub-arrays cannot be allocated before the containing array. That is,  must be allocated before the others. This leads us to the conclusion, which is also reflected in the layout given before, that Java-style multidimensional arrays are not necessarily allocated in contiguous memory locations.

Associative Arrays
Array index is usually of a scalar type although languages such as Perl and Tcl provide for non-scalar indexes through the use of hashing.5 Such arrays are called associative arrays.

In languages that lack operator name overloading, such as Java, one has to resort to using the relevant class with its messages.

Lists
Lists are generalized, dynamic, linear data structures. A (linked) list is a set of items organized sequentially, just like an array. In an array, the sequential organization is provided implicitly (by the position in the array); in a list, we use an explicit arrangement in which each item is part of a node that also contains a link to the next node.

Depending on how links are provided, we have:


 * Singly linked lists: These provide us only with a forward pointer pointing to the next node in the list.
 * Doubly linked lists: These provide us with two pointers, one pointing to the next and one pointing to the previous node in the list.

Another classification is possible with how the first and last nodes in the list are related:


 * Circular lists: In circular lists, the first and last nodes in the list are connected with link(s). In a circular doubly linked list, the next link of the last node points to the first node and previous link of the first node points to the last node of the list. In the case of a circular singly linked list, there is no pointer from the first node to the last one.
 * Linear lists: In linear lists, there are no links between the first and last nodes. The next field of the last node and the previous field of the first node point nowhere, i.e. they are null.

In the implementation of a list, one can make use of a header and/or a dummy end node.

Operations on lists are:


 * Creation/destruction operations.
 * Insert: Before/after a node with a certain key value; into the end, at the beginning.
 * Remove: A component with a certain key value; the first, last component.
 * Search: A component with a certain key value.
 * Empty: Test whether or not the list is empty.

Some commonly used, specialized (restricted) forms of lists are stack (LIFO), queue (FIFO), deque (Double-ended queue), output-restricted deque, and input-restricted deque.

Multilists
Multilists are similar to lists. The only difference is that nodes reside on more than one list simultaneously.



Dynamic Arrays
A cross between arrays and lists, dynamic arrays can grow or shrink their size in the run-time. Known to the Java-world as  and C#-world as , a dynamic array maintains an internal, heap-allocated array and replaces this with a larger (smaller) one as need arises.

Records
A record as a logical data structure is a fixed-size, ordered collection of possibly heterogeneous components that may themselves be structured. It may also be called the hierarchical or structured type. Record components, often called fields, are accessed by name rather than by subscript.

Variant Records
Some languages allow us to have variations of a record. Such a record with variations is called a variant record. This means that there will be some fields that are common to all variations and some fields that are unique to each variation.

Note that, with the introduction of the type extension facility, variant record has now become an obsolete feature. With the exception of languages like C++, which keep it for C compatibility, object-oriented programming languages do not include variant records in their arsenal of tools.

Variable-length Records
Variable-length records are similar to variant records. In this case the variation involves the number of a repeated field.

Some programming languages provide tuples, which are actually records whose fields are implicitly named by their position.

Before we see examples of how records are laid out in the memory, let’s look at a rather important issue affecting it: alignment requirements.

Alignment Requirements
For reasons of efficiency, some architectures do not let a data element start at an address that is not a multiple of the data element’s size. As a result of this alignment, it takes fewer memory accesses to fetch the data required in the program. This improvement on the running speed, however, comes at the expense of more memory.

It should be noted that the alignment scheme offered in the following examples is not the only one. Programming environments may in some way let you alter the way data is aligned. As a matter of fact, programming environments built on top of the Intel architecture may even let you turn on and off alignment.

The alignment requirement for a record type will be at least as stringent as for the component having the most stringent requirements. The record must terminate on the same alignment boundary on which it started.

Realize that alignment requirement of an array is equal to the alignment requirement of its component. Whether the array size is 1 or a million bytes, its alignment requirement is always the same. This is due to the fact that an array declaration is shorthand for defining multiple variables. The above fragment should actually be considered as given below:


 * }

So, alignment requirement of our structure definitions is equal to that of : 8.

If we change the selector of the variant part to  we have


 * Variant Record Representation-2.png

Depending on the architecture, this saves us 4 to 8 bytes for each record. But it leaves us without type safety.

Sets
A set is a nonlinear structure containing an unordered collection of distinct values. Typical operations on sets are insertion of an individual component, deletion of an individual component, union, intersection, difference, and membership.

Base types in sets are restricted to scalar types and, due to the storage requirements; the number of potential members is severely limited.

Trees
A tree is a nonempty collection of nodes and edges that satisfies certain requirements. A node is a simple object containing link(s) to other nodes and some value; a link to another node is called an edge.

In a tree,


 * Predecessor of a node is called its parent.
 * Successor(s) of a node is (are) called its child(ren).
 * Nodes without successors are called leaves.
 * The node without a predecessor is called the root.
 * There are no unreachable nodes from the root.
 * There is only one path between the root and some node.

Trees are encountered frequently in everyday life. For example, many people keep track of their ancestors and/ or descendants with a family tree. As a matter of fact, much of the terminology derives from this usage. Another example is found in the organization of sports tournaments.

Graphs
Similar to a tree, a graph is a collection of nodes and edges. Unlike trees, graphs do not have the notion of a root, a leaf, a parent, or a child. A node, also called vertex, can live in isolation; that is, it can be unreachable. For some vertex pairs, there can be more than one path connecting them to each other.

A great many problems are naturally formulated using graphs. For example, given an airline route map of Europe, we might be interested in questions like: "What's the fastest way to get from Izmir to St. Petersburg?" It's very likely that many city pairs have more than one path connecting them. Another example is found in finite-state machines.

A graph can be represented in two ways:


 * 1. Adjacency matrix representation
 * A V-by-V, where V is the number of vertices, array of boolean values is maintained, with a[x, y] set to true if there is an edge from vertex x to vertex y and false otherwise.



! style="text-align: left;" | Example: Adjacency matrix representation in Pascal.
 * }


 * 2. Adjacency-structure representation
 * In this representation all the vertices connected to each vertex are listed on an adjacency list for that vertex.



! style="text-align: left;" | Example: Adjacency structure representation in Pascal,
 * }

While adjacency matrix is the better choice for dense graphs, for sparse graphs adjacency-structure representation turns out to be a more feasible solution.

User-defined Types
Besides enhancing the readability and clarity of the program text, defining user-defined types makes it possible to compose a complicated data structure once and then create as many instances (variables of that type) of it as necessary and, secondly, to use the language’s own type-checking facility for input data validation such as range or consistency checking.

User-defined types come in two flavors:


 * 1. Enumeration types
 * An enumeration type provides for the enumeration of the domain of the type by the programmer. The domain values are listed in a declarative statement.



! style="text-align: left;" | Example: Enumeration type in C++
 * In C++, instead of
 * In C++, instead of
 * we can have
 * we can have
 * }


 * 2. Subtypes
 * A subtype is the specification of the domain as a subrange of another already existing type.

! style="text-align:left;" | Example: Subtypes in Pascal.
 * With these declarations in place, we cannot assign any value outside the range to the  variable. Any such attempt would be caught by the typing system and cause the program to terminate with an error. [In programming languages supporting exceptions, this could be handled more graciously without terminating the program.]
 * }
 * }

Realize, the enumeration and subtype definitions do not allow the programmer to specify operations on the newly defined data type.

Abstract Data Types
The defining characteristics of an abstract data type (ADT) is that nothing outside of the definition of the data structure and the algorithms operating on it should refer to anything inside, except through function and procedure calls for the fundamental operations.

The above Pascal fragment is unfortunately not an ideal implementation of an ADT. Fields of the record definition is open to manipulation; there is no language construct that prohibits changing the underlying structure directly by changing its components. It is also possible for the programmer to use any subprogram, be it one meant for export or one meant for implementing some auxiliary functionality. A mechanism to provide controlled access to subprograms and record fields, such as provided with access specifiers in object-oriented programming languages, is missing.

Organization of the related subprograms into a compilation unit is not enforced by the language. One could add subprograms that are irrelevant to the ADT being implemented. There is no compiler-reinforced rule relating the pre-existing and/or newly added subprograms to the ADT in question. The best we can do is to stick to conventions for better organizing our programs.

One other weakness, one that can be remedied very easily, of the preceding fragment is the fact that we cannot create more than one stack.

What we need is something like the module construct found in modular programming languages or the class construct of object-oriented programming languages.

The above implementation is a lot better than the previous one: The only way a stack can be manipulated is through the operations defined on it. The representation of the ADT is declared to belong to the private part of the package. This certainly is a great improvement over the first example. But, it still requires the user to perform creation and initialization in two separate steps, a rather error-prone process. What if we forget to call the initialization routine?

One other problem with this approach is its lack of adaptability. Once it has been defined, the user cannot alter its behavior except for by modifying its definition. This may at times be very restrictive. Consider defining a shape type in a graphics system. Although shapes may differ in many aspects, there are certain operations that can be applied to all shapes, such as drawing, rotating, ttanslating, and etc. At first we might come up with a central routine where we call the appropriate drawing routine of the specific shape type. An implementation, which is typical of non-object-oriented programming languages, would look like as follows:

One weakness of the preceding solution is the requirement that the central function  and   must know about all kinds of different shapes. If you define a new shape, every operation on a shape must be examined and possibly modified. As a matter of fact, you cannot even add a new shape to the system unless you have access to the source code. And generally, you do not have such a privilege when you are a mere user of the class. So, the implementer faces a dilemma: either ship the implementation with missing shapes or indefinitely postpone shipping it until you make sure you have an exhaustive list of possible shapes. Ideally, you would like to ship your software as early as you can and as fully functional as possible. This is referred to as the open-closed principle. But, how can you possibly provide an exhaustive functionality when you have limited time and you don’t have much idea about the alternatives? The answer is to get the user to extend your software and the keyword is inheritance. In C++, you would provide the following solution:

In addition to the aforementioned advantages, it is the compiler that controls the whole process, whereas in the previous case it was the user doing the all bookkeeping stuff.