Programming Language Concepts Using C and C++/Generic Containers

In this chapter, we will take a look at an implementation of a generic container, namely a vector. Our module will enable its clients to create and manipulate vectors housing items of different data types. In doing so, we will simulate the C++-style containers, which contain objects of a certain type. That is, a particular container will be either a collection of ints or a collection of doubles, but not a mixture of both. This is in contrast with the pre-1.5 Java-style containers, where a container is a collection of Objects and therefore can have items belonging to incompatible classes.

Another aspect of C programming that we will be discussing in this handout is the use of variadic functions to compensate for the absence of overloading. Enabling the programmer to have an indefinite number of unnamed formal parameters on top of the named ones, variadic functions can be used to provide a poor man’s substitute for overloading. All we have to do is to get the caller to provide an extra argument for discrimination among overloaded function candidates.

Example: Instead of writing

void f(int); void f(int, int); void f(double);

f(3); f(3, 3); f(3.0);

we now write

void f(int func_type, ...);

f(ONE_INT, 3); f(TWO_INTS, 3, 3); f(ONE_DOUBLE, 3.0)

Our presentation will be concluded with a few words on static object module libraries.

Interface
General.h

Following macro definitions stand for constructor types that one should consider in the process of implementing an abstract data type.

Creating a copy of a  object from an already existing one involves copying its components too. With a little insight we can see that there is no universal function to serve this purpose. As a matter of fact, same data type may require different such functions in different contexts. Same argument applies to the case of destroying a  object.

So, are we (the programmers) supposed to provide each and every possible copy and destruction functions? This beats the purpose of providing a generic container. We have to get the user to cooperate with us. And the way to do so is through the following pointer-to-function type definitions.

Vector.h

We want to implement a generic vector. That is, a vector that can have anything as its component: a vector of s, a vector of character strings, a vector of student records, and so on. Given this requirement and the fact that the structure of the component can be known only to the user [not the implementer], we need to ask the user for some assistance about copying and destroying components of the vector. Basically, the user must implement these two functions and somehow pass it to the implementer. In other words, the user and the implementer must cooperate; the user must fill in the missing parts in the framework that is provided by the implementer. A user maintaining a Vector of student records has to provide copy and destruction functions for student record type; a user dealing with employee records has to provide the same functions for employee record type; ...

This style of programming differs from the conventional technique, where the user of the library develops algorithms that invoke library functions, but the overall structure of the application and flow of control differs for each new application. Although almost any C program makes use of the I/O library, the overall structure of a particular program is independent of others; routines provided in the I/O library are simply 'low-level' operations common to all programs and do not have to be tailored to the needs of a particular application. This is the technique used with procedural programming languages such as Pascal and C.



In software framework usage, however, it is the overall structure, the high-level view of the algorithm that remains constant and is reused from application to application. The programmer (the user of the framework) specializes the framework, but majority of code remains the same across all applications. Because of this inversion in the relationship between library code and the user-supplied code, a software framework is sometimes referred to as an "upside-down" library.



The best well-known software frameworks are those associated with graphical user interfaces. The major tasks involved in executing a graphical program- waiting for the user to move or click the mouse, waiting for keystrokes, moving or resizing windows, and so on- do not change much from one application to another. Thus these actions can be usefully combined in a single framework not only to save programming effort in developing new applications, but also to ensure consistency among applications. What distinguishes one program from another are features such as the contents of windows and the action to be performed in response to a mouse click or keystroke. It is these actions that must be programmed anew in each specific application of the framework.

This is the style of choice in object-oriented programming languages. To create a new application, it is necessary only to subclass from the original class [using inheritance] and then implement the behavior associated with these deferred functions. All other functionality, including the overall structure of the application, is obtained “for free” through the use of inheritance. Popular examples are the C++-based MFC (Microsoft Foundation Classes) and Java-based JFC (Java Foundation Classes).

This style of programming (software framework) would be possible if the implementer could 'call back' to some user code. In an object-oriented programming language, this can be done by implementing an interface defined within the framework or subclassing an abstract class and implementing the abstract functions&mdash;such as mouse click, keystrokes, and etc.&mdash;in this class. In C, we can do this by using function pointers. All we have to do is to define the function signatures, which is what we are doing here, and use these definitions as types of parameters in certain functions. The callback takes place in these functions by the implementer calling the user’s code through the function pointer passed to the function.

The following function signature is an example to the declaration of variadic functions in C. Ellipsis used in the specification signifies an unknown number of parameters coming after the named arguments. Number of arguments passed to a variadic function is deduced from the contents of one or more of the named arguments. Take the  function for instance, whose prototype is provided below.



Placeholders, certain character sequences starting with ‘%’, in format_string help in figuring out the number and type of arguments.

Implementation
Vector.c

For controlling the lifetime of a particular object shared among different handles (actually pointers in our case), we use an extra field to hold the number of handles that provide access to the same underlying object. Such a field is called a reference count.

The last field of the structure definition can be seen, for pedagogical purposes, as. That is, a  is basically an array of  s. Using   instead of   gives us the flexibility of changing the size of our   dynamically. Had we used  we would have had to specify the size at compile time, which would have been too restrictive for our purposes.

The following line declares a variable that will be used in traversing the list of unnamed arguments.

The following macro is used to initialize the list of unnamed arguments so that an internal pointer in  points right after the last named argument.



Next line returns an argument of type  and advances the internal pointer so that it points to the location right after this   argument. The run-time stack after execution of this statement is as shown below:



This is where we retrieve the function pointers passed by the user. Each time we need to copy or destroy a component we will call back to the functions found at the given addresses.

In addition to, we can use   to allocate a region of memory from the heap. This function allocates memory large enough to hold n (passed as the first argument) elements, each of size m (passed as the second element); it returns a pointer to the first element of the array. This can certainly be done by. As an extra, the returned region is bitwise zeroed, which is not done by.

Memory allocated  can be returned using   or. But keep in mind that despite its omnipresence  is not a standard C function.





performs necessary cleanup operations on. In other words, it serves the purpose of a destructor. It should be called by the programmer after all arguments have been read with.

The structure of a  object is as given below.

Insert figure

The user accesses a  object through a handle (this variable in the figure). The representation is hidden and it is modified only through the functions in the public interface of.

Note that the handle is a pointer to the properties of the, not to its components. This is the typical approach taken to implement collection types. You have a handle on some metadata, which (in addition to defining properties of the collection) has a handle on the container that is used to hold the components.

The  function having been called doesn’t mean the underlying object will be destroyed. In the case of an object that is shared among different handles, the reference count is decremented and the underlying object is left untouched. If the reference count is one we are about to severe the tie between the object and the last handle on it. In such a situation, we must also destroy the underlying object!

Note that  is a pointer to function. That is, the function is called through a pointer; we can change the function being invoked by assigning different pointer values. This is actually what gives us the ability to deal with different types of data.

See that we return a copy of the required component, not a pointer to it. The rationale behind this is to avoid giving any access to the underlying data structure.

Next function assigns a  object to another. Since we don’t have operator overloading in C, we have to come up with a function name and use this name to perform assignment. Of course, we could still use '=' to do this. But, doing so would lead to an inconsistent state, where reference count and the number of handles sharing the underlying object do not agree.




 * }

The previous code fragment will produce the picture given below.

Insert figure

Before we make the insertion, we have to make sure there is enough room in our structure. If not, we adjust the container capacity according to some predefined formula. In case we may be out of heap memory, we do not effect any changes in the structure and just return a  value from the function. Otherwise, we extend the memory region needed for the container and go ahead with the insertion: if the new component is inserted at the end just copy it at the end of the vector, which is indicated by the  field; if the new component is inserted somewhere other than the end, shift the components that are to come after location of the new item down by one and make the insertion.

function extends the capacity of a  according to some given formula. In our implementation, we adopt the formula used in most C++ compilers: if the  has a nonzero capacity increment value, capacity is incremented by this amount; if the user has not provided any value for this field or it is zero, the new capacity is calculated by doubling the previous one. (If the current capacity value is zero then new capacity is taken to be one.) Failure to allocate enough room for the  object results in returning a   value.

Shifting down is required whenever a new item is inserted into any location other than the end of the.

Similarly, shifting up is required whenever an item is removed from any location other than the end of the.

Test Programs
Vector_Test.c

Note we insert Integer objects&mdash;which are basically objects of an opaque type that wrap around an int&mdash;not ints. This is a consequence of the genericness provided by means of void*, which requires insertion of pointers.



Now that the value returned (ret_obj) will be assigned to pval, we will have a handle on the removed object through pval. Realize the object is cloned and then returned. This new object, after assignment to pval, can be manipulated independently of the original copy.

Running the Test Program
(Using Cygwin)


 * gcc –c –ID:/include Vector.c↵
 * gcc –o Vector_Test.exe Vector_Test.c Vector.o –lWrappers –ID:/include –LD:/library↵

The above command will bring into the executable the required object files found in the [static] library named libWrappers.a. Name of the library to use is figured out by simply prefixing "lib" and suffixing ".a" to the file name supplied with the –l switch. Search for this library will make use of the directory passed with the –L switch.


 * Vector_Test < Vector_Test.input > Vector_Test.output↵

This command will redirect both input and output to disk files. It will still make the program believe that it is receiving its input from the keyboard and sending its output to the screen. This becomes possible by remapping stdin and stdout to different physical files. With no remapping, all processes in a system have the following initial file table:

After remapping we have:

For both before and after remapping, we have the following macro definitions in place:




 * }

This is all done by the command-line interpreter. The user doesn’t need to make any modifications in the source code; (s)he can still program as if input were entered at the keyboard and output were written to the screen. This is possible because we read from or write to a physical file through a handle (logical file). If we change the physical file the handle is mapped to, although we write to the same logical file (handle) the final destination affected changes.

Vector_Test.input




 * }

Vector_Test.output


 * }

Object Module Libraries
Whether you use them or not, linking statically with an object module brings in all the functionality within the module. This means code you never use in your program will be there to increase size of the executable. One can avoid this by splitting the contents of the module into smaller chunks. Using these smaller modules will probably reduce the size of the executable but you now face another problem: having to write the names of the object modules separately on the command line.

Libraries offer a combination of both worlds. Linking with a library means all external entities within the individual modules are visible. However, only the modules that are used are brought into the executable. Switching from one module to another in the library does not force you to change the command line. Everything is taken care of by the linker and the library manager.

Example: Suppose libWrappers.a contains Integer.o, Char.o, Short.o, and so on. Having used some functionality from Integer.o, the following command will cause only Integer.o to be brought in.


 * gcc –o Vector_Test.exe Vector_Test.c Vector.o –lWrappers –ID:/include –LD:/library↵

If we later decide to use some function from, say, Char.o, we don’t need to make any modifications on the command line. The linker-library manager pair will sort out the details for us and bring in only Char.o and Integer.o.

Definition: A library is an archive of object modules, which&mdash;in addition to the modules&mdash;generally has some sort of metadata to quicken locating the individual files.

Library Header File
Wrappers.h includes forward declarations and prototypes required for boxing primitive type data and unboxing wrapped data. Note same thing could have been done by placing relevant information in separate header files. There is no rule saying that one must have a single header file for an entire library. As a matter of fact, you can have a separate header file for each and every object module [or even function] in the library. Choosing to have a separate header file for each object module, however, has a disadvantage: As you add a new function call in your code you have to figure out the header file it comes from and [probably] add a relevant include directive, which is a reincarnation of the problem of having to write multiple object files on the command line.

In some cases, striking a balance between the two extremes might be a better choice. Take, for instance, a math library that offers tens, if not hundreds, of functions. Sifting through a single header file in such a case is admittedly a painstaking process. Having multiple headers corresponding to different aspects of the math library might be a good idea.

Wrappers.h

Since our generic vector expects an opaque pointer as its component, using it with primitive data may turn out to be painful: You must first wrap the data and use the pointer to this wrapped data in your transactions. Later on when you want to print the data to some medium, you must do some unboxing&mdash;that is, retrieve the contents of the region pointed by the opaque pointer&mdash;and then print the data. Similar things can be said about comparing, copying, and destroying the data.

This pattern is so common that a special library (libWrappers.a) is offered for the user’s convenience. In case (s)he may want to use the Vector module and the like with primitive data types, (s)he doesn’t have to reinvent the wheel; all (s)he has to do is to use the functions found in libWrappers.a.

The whole thing can be likened to the wrapper classes in Java and the concept of automatic boxing-unboxing in C#. As a matter of fact, this way of using a container is so frequent that Java, as of 1.5, supports automatic boxing-unboxing.

A Sample Module
What follows is an implementation of the wrapper interface for type int. Wrappers for the other primitive types can be provided similarly and will not be included in this handout.

Integer.c

Building the Library
Using Cygwin

Before building the library we need to create the individual object modules that are to go into it.



If all source files are in the same directory and no other files exist in that location, you can do it all in one step:



We can now create the library&mdash;or archive in Unix lingo&mdash;and append the object modules to it by the following command. This will put all files ending with .o into libWrappers.a.



If you are not convinced you may want to list the contents of the library.


 * Char.o
 * Double.o
 * UShort.o
 * UShort.o
 * UShort.o
 * UShort.o

Now that libWrappers.a has been built, we can now place it in some fixed location and utilize the functions residing in it as we did in our test program.

Using Windows

Before you issue the following commands make sure command-line tools are made available as external commands. This may be easily done by executing [the commands contained in] the batch file named vcvars32.bat found in the bin subdirectory of the Microsoft Visual C/C++ compiler.



The preceding lines create the object files we need for building the library. Although rather short, they underline the fact that we are using a compiler-driver, not a plain compiler. In order to successfully bring in the required headers, using the  option we pass the preprocessor information about places to look for them. Thanks to the  we also tell the compiler-driver to stop before linking. Finally, not really needed in our example, using  we tell the compiler proper to treat the files passed on the command line as C source code.


 * Microsoft (R) Library Manager Version 7.10.3077
 * Copyright ( C ) Microsoft Corporation. All rights reserved.
 * UShort.obj
 * ULong.obj
 * Char.obj
 * ULong.obj
 * Char.obj
 * Char.obj
 * Char.obj

, Windows counterpart of, is the library-manager command. Passing different options we can get it to build static libraries, list library contents, and so on.



Another convincing proof that  is actually a compiler-driver! In order to get Vector_Test.exe; preprocess and compile Vector_Test.c and link the resulting object file with Vector.obj and needed object files from Wrappers.lib, which can be found as a result of a search starting with the location given by means of the  option passed to the linker.