Programming Language Concepts Using C and C++/Object-Based Programming

Speech, basically an activity that involves sharing a picture of countless hues with others, is successful only when the parties involved come up with similar, if not identical, depictions of a thought, an experience, or a design. Success is a possibility when the following criteria are met:


 * 1) Parties involved share a common medium,
 * 2) This common medium supports relevant concepts.

Absence or lack of these criteria will turn communication into a nightmarish mime performance. Imagine two people, with no common medium between them, trying to communicate with each other. Too much room for ambiguity, isn’t it? As a matter of fact, even when the two parties speak the same language their life views, read it “paradigms”, may make communication an unbearable exercise.

A concept, abstract or concrete, does not have any corresponding representation in the language if it doesn’t have room in the imagination of its speakers. For instance, Arabic speakers use the same word for ice and snow, while Eskimos have tens of words for snow. This cannot be used as a proof of intellectual incapacity, however: roles are reversed when it comes to depicting qualities of a camel.

Last deficiency is dealt with in two ways: A new word, probably related to an already existing one, is introduced; or an idiom is invented to express the inexpressible. The former seems like a better choice since the latter is open to misinterpretation and therefore leads to ambiguity, which brings us back to square one. It will also blur its-that is, new concept's-relation with other concepts and turn the vocabulary of a language into a forest of branchless trees.

So, what with programming languages? Programming languages, like natural languages, are meant to be used for communicating with others: machine languages are used to communicate with a specific processor; high-level languages are used to express our solution to fellow programmers; specification languages relay analysis/ design decisions to other analysts/designers.

Had we had it-that is, relaying our ideas to her/his majesty, the computer-as our one and only goal providing a solution to a problem would not have been such a difficult task. However, we have another goal, which is worthier of our intellectual efforts: explaining our solution to other human beings. Achieving this more elusive goal requires adoption of a disciplined approach and a collection of high-level concepts. The former is useful in analyzing the problem at hand and providing a design for its solution. It enables us to more easily spot recurring patterns and apply already-tested solutions to sub-problems. The latter is the vocabulary you use to express yourself. Using idioms instead of language constructs for this purpose is a potential for misunderstanding and a barrier erected between the newcomer and the happy few.

On the other hand, assimilation of idioms will not only let you speak the language but also make you one of the native speakers. Speaking a foreign language is now changed from a dull exercise of applying grammar rules into an intellectual journey in the mindscapes of others. This journey, if not cut short, generally reveals more about the concept the idiom is a substitute for; it helps you build [in a bottom-up fashion] a web of interrelated concepts. Next time you take the journey signposts you erected before will help you more easily find your way.

So, which programming language(s) should we learn? If it is your 10-year old cousin who asked this question, it wouldn’t be an end to the universe if (s)he started with MS Visual Basic or some other Web-based scripting language; if it is a will-be professional who will earn her/ his living by writing programs, (s)he had better care more about concepts than the syntax of certain programming languages. What is crucial in such a case is the ability to build a foundation of concepts, not a collection of random buzzwords. For this reason C/ C++, with their idiomatic nature, will be the primary tools for taking our journey into the programming language concepts.

Module
Programming (or software production) can be seen as an activity of translating a problem (expressed in a declarative language) into a solution (expressed in machine code of a computer). Some phases in this translation are carried out by automatic code generating translators such as compilers and assemblers, while some are performed by human agents.

Key to easing this translation process, apart from shifting the burden to code generators, is matching the concepts in the source language with those in the target language. Failing to do so means extra effort is required and leads to using ad hoc techniques and/ or idioms. Such an approach, while good enough to relay the solution to the computer, is not ideal for explaining your intentions to the fellow programmers. Accepting the validity of this observation won’t help you in some situations, however. In such a case, you should strive to adopt an idiomatic approach instead of using an ad hoc technique. And this is exactly what we will try to achieve in this handout: We will establish a semi-formal technique to simulate the notion of objects in C. If successful, such an approach will ease the transition from an object-based initial model (typically, a logical design specification) to a procedural model (C program).

To achieve our goal, we will adopt a technique familiar to us from the last two sections of the Programming Level Structures chapter: simulation of a concept using lower-level ones. Not having a class or module concept in C, we will use an operating system concept: file. To see it in action, read on!

Interface
Sticking to a widely adopted convention, contents of a header file, or parts of it, are put inside an  pair. This avoids multiple-inclusion of the file. First time the file is processed by the preprocessor,  is undefined and everything in between the   and   directives will be included. Next time the header file is included while preprocessing the same source file, probably by some other included file,  will have already been defined and all the contents in between the   pair are skipped.

Following the  macro, General.h is included to bring in the macro definition for.

Complex.h

The following is a so-called forward declaration. We declare our intention of using a data type named  but betray no details about its structure. We fill in the details by providing the definition in the implementation file, Complex.c. Users of the  data type need not know the details of the implementation. Such an approach gives the implementer the flexibility of changing the underlying data structures without breaking the existing client programs.

The reason why we defer the definition to the implementation file is the lack of access specifiers (e.g.,,  ,  ) in C as we have in object-oriented programming languages. This forces us to keep everything that should not be accessed by the user as a secret.

Observe that  is defined to be a pointer. This complies with the rule that an interface should include things that don’t change. And regardless of the representation of a complex number, which can be changed at the whim of the implementer, memory layout of the pointer to this representation will never change. Hence do we use  in the function prototypes rather than.

Note the distinction between interface and implementation is reinforced by sticking to conventions, not by some language rule checked by the C compiler. We could lump the interface and implementation into a single file and the compiler would not complain a bit.

All of the following prototypes (function declarations) are qualified with the  keyword. This means that their implementations (function definitions) may appear in a different file, which in this case is the corresponding implementation file. This makes exporting functions possible: All files including the current header file will be able to use these functions without providing any implementations for them. Seen in this perspective, the following list of prototypes can be regarded as an interface to an underlying object claiming to provide an implementation.

Definition: An interface is an abstract protocol for communicating with an object.

All  functions are exported (that is, they are made visible to other files that are used to build the executable) from the implementing file(s) and are linked with–read it as “imported”–by their clients. Such imported functions are said to have external linkage. In case they are implemented in another file, addresses of these functions are unknown to the compiler and are marked to be so in the object file produced by the compiler. The linker, in the process of building the executable, will later fill in these values.

Remember that  is a   for a pointer to. That is, it is essentially a pointer. For this reason, when qualified with the  keyword it is the pointer that is guarded against change, not the fields pointed to by the pointer. This type of behavior is similar to that displayed in Java: when an object field is declared to be, it is the handle, not the underlying object, that is guarded against change.

Depending on where it is placed using const may mean different things.




 * (mutable)
 * rowspan="2" width="20px" |
 * rowspan="2" |
 * style="border: 1px solid darkgray;" align="center"| an  value
 * - style="height: 10px;"
 * }
 * - style="height: 10px;"
 * }
 * }




 * (immutable)
 * rowspan="2" width="20px" |
 * rowspan="2" |
 * style="border: 1px solid darkgray;" align="center" | an  value
 * - style="height: 10px;"
 * }
 * - style="height: 10px;"
 * }
 * }


 * (mutable)
 * width="10px" |
 * (immutable)
 * rowspan="2" width="20px" |
 * rowspan="2" |
 * style="border: 1px solid darkgray;" align="center" | a ptr to
 * &rarr;
 * style="border: 1px solid darkgray;" align="center" | an  value
 * - style="height: 10px;"
 * }
 * - style="height: 10px;"
 * }
 * }




 * (immutable)
 * width="10px" |
 * (mutable)
 * rowspan="2" width="20px" |
 * rowspan="2" |
 * style="border: 1px solid darkgray;" align="center" | a ptr to
 * &rarr;
 * style="border: 1px solid darkgray;" align="center" | an  value
 * - style="height: 10px;"
 * }
 * - style="height: 10px;"
 * }
 * }




 * (immutable)
 * width="10px" |
 * (immutable)
 * rowspan="2" width="20px" |
 * rowspan="2" |
 * style="border: 1px solid darkgray;" align="center" | a ptr to
 * &rarr;
 * style="border: 1px solid darkgray;" align="center" | an  value
 * }
 * style="border: 1px solid darkgray;" align="center" | an  value
 * }

Another point worth mentioning is the first formal parameter common to all functions:. This corresponds to the target object (the implicitly passed first argument) in the object-oriented programming languages. The function is applied on the object passed as the first argument, which is appropriately named. Identity of this object cannot change during the function call although the object content can vary. That is why we qualify the parameter type with  keyword.

For obvious reasons, signatures of  and   form exceptions to the above mentioned pattern. The constructor-like function  allocates heap memory for the yet-to-be-created object and initializes it, whereas the destructor-like function   frees the heap memory used by the object and makes the object pointer unusable by assigning   to it.

Implementation
Complex.c

The following directive may at first seem extraneous. After all, why should we include a list of prototypes (plus some other stuff) when it is us who provide the function bodies for them? By including this list we get the compiler to synchronize the interface and implementation. Say you modified the signature of a function in the implementation file and forgot to make relevant changes in the interface file; the function with the modified signature will not be usable (because it is not listed in the interface) and a function in the interface file won’t have a corresponding implementation (because the intended implementation now has a different signature). When we include the header file, compiler will be able to spot the mismatch and let you know about it.

Ironically, this becomes possible due to the lack of function overloading in C. C compilers will take the implementation as the definition of the corresponding declaration and make sure they match. Had we had function overloading compilers would have taken the definition as an overloading instance of the declaration and carried on with compilation.

Unlike DOS, where ‘\’ is used, UNIX uses ‘/’ as the separator between path name components. C having been developed mainly in UNIX-based environments uses ‘/’ for the same purpose. The reason why our previous examples worked all right was due to the fact that the compilers used were DOS implementations and interpreted ‘\’ correctly. If we want more portable code, we should use ‘/’ instead of ‘\’.

The following prototypes are provided here in the implementation file, because they are not part of the interface. They are used as utility functions to implement other functions. Had they been part of the interface, we would have put them in the corresponding header file, Complex.h.

Notice these two functions are qualified to be. When global variables and functions are declared, they are made local to the file they are being defined in. That is, they are not accessible from outside the file. Such an object or a function is said to have internal linkage.

In C, functions, variables, and constants are by default. In other words, unless otherwise stated they are accessible from outside the current file. This means we can omit all occurrences of  in the header file. This is not advisable, though. It would make porting your code from C to C++ difficult. For example, constants in C++ are by default, exactly the opposite of what we have in C!

Definition: An implementation is a concrete data type that supports one or more interfaces by providing precise semantic interpretations of each of the interface’s abstract operations.

We provide the details for the forward declaration made in the header file. Realize that this is the implementation file and the following definition is seen only by the implementer. Normally, the only files seen by the users are header files and the object files.

Following function serves to create and initialize a  variable, similar to the combination of new operator and constructor in object-oriented programming languages.

Definition: Constructor is a distinguished, implicitly called function that initializes an object. Following the successful allocation of memory typically by a  operator, it is invoked by the compiler-synthesized code.

Note the constructor-like function must be explicitly called in our case. Because, the notion of a constructor is not part of the C programming language.

Sometimes we need to have more than one such function. As a matter of fact, there are at least two other ways to construct a complex number: from another complex number and polar coordinates. Unfortunately, should we like to add another constructor; we have to come up with a function that has a new name or provide different function definitions through a single variadic funtion, because C does not support function name overloading.

Definition: Function name overloading allows multiple function instances that provide a common operation on different argument types to share a common name.

Assuming the widely used convention that return value is stored in a register, upon completion of the constructor function we have the partial memory image provided on the next page.



Observe the lifetime of the memory region allocated on the heap is not limited to that of the local pointer. At the conclusion of the function,  will have been automatically discarded while heap memory will still be alive thanks to the pointer copied into the register.

Following function serves to destroy and garbage-collect a  variable. It is similar to destructors in object-oriented programming languages.

Definition: Destructor is a distinguished, implicitly called function that cleans up any of the resources the object acquired through the execution of its constructors, or through the execution of any of its member functions along the way. It is typically called before the invocation of a memory de-allocation function.

Programming languages with garbage collection introduce the notion of a finalizer function. Now that the garbage collector reclaims unused heap memory, programmer need not bother about it anymore. But, what about files, sockets, and etc.? These must in some way be returned to the system, which is what a finalizer is meant to do.

Our destructor-like function is rather simple. All we have to do is to return the heap memory allocated to the  variable that is passed as the sole argument of the function. returns the memory pointed to by its argument, not the argument itself. One other reminder:  is used to de-allocate heap memory; static data and run-time stack memory is de-allocated by the compiler-synthesized code. Be that a region in heap or otherwise, one should not make assumptions about the contents of de-allocated memory. Doing so will give rise to non-portable software with unpredictable behavior.

Following function checks for equality of two complex numbers. Note that equality-check and identity-check are two different things. That’s why we do not use pointer semantics for comparison. Instead, we check whether the corresponding fields of the two numbers are equal or not.

Following function serves as what is called a get-method (or an accessor). We provide such functions in order to avoid the violation of information hiding principle. The user should access the underlying structure members through functions. Sometimes functions are also provided to change values of members. These are called the set-methods (or mutators).

Definition: Information hiding is a formal mechanism for preventing the functions of a program to access directly the internal representation of an abstract data type.

It should be noted that accessors [and mutators] can also be provided for attributes of an object that are not backed by members of the underlying structure. For example, a complex number has two polar attributes that can be derived from its Cartesian attributes: norm and angle.

Next function is meant to serve a similar purpose as the  of Java. This one, however, does not return any value; it just writes the output to the standard output file, which is definitely much less flexible than its Java counterpart, where a  is returned and the user can make use of it in any way she sees it fit: she can send it to the standard output/error file, a disk file, or another application listening at the end of a socket. A function with such a semantic is given below.

However, users of the above implementation should not forget returning the memory allocated for the  object holding the string representation.

Next two functions do not appear in the header file. Users of the  data type do not even know about them. For this reason, they do not [and cannot] use them directly. The implementer can at any time choose to make changes to these functions and other hidden entities, such as the representation of the type. This is a flexibility provided to us by applying the information hiding principle.

Test Program
Complex_Test.c

Inclusion of Complex.h brings in the prototypes for functions that can be applied on a  object. This enables the C compiler to check whether these functions are used correctly in the appropriate context. One other purpose of header files is to serve as a specification of the interface to the human readers.

Notice it is the prototype that is brought in, not the object code that contains the implementation. Object code of external functions is plugged in by the linker.

Normally, a user is not given access to the source files. The rationale behind this is to protect the implementer’s intellectual property. Instead, the object files, which are not intelligible to human readers, are given. The object files are the compiled versions of the corresponding source files and therefore semantically equivalent to the source files.

As soon as the next assignment command is completed we will have the partial memory image given below:



Notice the non-contiguous nature of heap allocation. Although for a program of this size memory allocated will likely be contiguous, as the program gets larger this becomes impossible. The only job of a memory allocator is to satisfy allocation demands; address of the allocated memory is of no consequence. For doing this it may use different algorithms such as first-fit, worst-fit, best-fit, and etc.

In  we had created a complex number on the heap and returned a pointer to this as the result. Next time we use the same  variable to hold the result of another operation, the old location that holds the result of the previous operation will be unreachable. Such unreachable, and therefore unusable, locations in memory are referred to as garbage. In programming languages with automatic garbage collection, such unused heap memory is reclaimed by the runtime system of the language. In object-oriented programming languages without automatic garbage collection, this must be taken care of by the programmer through the invocation of a function such as, which in turn invokes a special function called destructor. In non-object-oriented programming languages, destructor function has to be simulated and the programmer must explicitly return such memory regions back to the system for reuse. In our case, the function simulating the destructor is named.

Upon completion of the next line, we will have the following partial memory image:



Observe that  still points to the same location. That is, we can still use  to manipulate the same region of memory. But, no guarantee is given about the contents. So, in order to keep the user away from the temptation of using this value, it would be a good idea to change the value of  to something that cannot be used to refer to an object. This value is. Each time a memory region is de-allocated, the pointer pointing to it should either be made to show another region, as in this test program, or the user should assign  to the pointer variable. A second, more secure approach gives the responsibility of assigning  to the implementer. The problem is we need to modify the pointer itself, not the region it points to. This deficiency can be removed by making the following changes:

Complex.c

Complex_Test.c

Running the Test Program

 * gcc –I ~/include –c Complex.c↵ # ~ stands for the home directory of the current user; note the space between –I and ~/include!

The above command will produce Complex.o. Note the use of –I and –c options. The former gives the preprocessor a hint about the place(s) to look for non-system header files, while the latter will cause the compiler to stop before linking. Unless a header file is not found in the given list of directories, it is searched in the system directories.

As you can see, our code has no main function. That is, it is not runnable on its own. It just provides an implementation for complex numbers. This implementation will later be used by programs like Complex_Test.c, which manipulate complex numbers.


 * gcc –I ~/include –lm –o Complex_Test Complex_Test.c Complex.o↵

The above command will compile Complex_Test.c and link it with the required object files. The output of linking will be written in a file named Complex_Test. -l option is used for linking to libraries. In this case we link to a library named libm.a, where m stands for the math library. We need to link to this library to make sure that object code for functions, such as, is included in the executable. As a result of linking to the math library, only the object code of the file containing the implementation for  is included in the executable.

Program Development Process
The whole process can be pictured as shown in the following diagram.



Black colored region in the diagram represents the implementer side of the process. What goes on inside this box is of no concern to the users; number of sub-processes involved, intermediate outputs produced are immaterial to them. As a matter of fact, the module could have been written in a programming language other than C and it would still be OK as long as the client and the implementer use the same binary interface. All they should care about is the output of this black-box, Complex.o, and the header file, Complex.h, which is needed to figure out the functionality offered by Complex.o.

Note that Complex.o is semantically equivalent to Complex.c. The difference lies in their intelligibility to human readers and computers: C source code is intelligible to a human being while the corresponding object file is not. This lack of intelligibility serves to protect the intellectual property of the implementer. After spending months on a project, the implementer delivers the object module to the clients, which contains no hints as to how it has been implemented.

Once the user acquires the object module and the related header file(s), she follows the following steps to build an executable using this object module.


 * Write the source code for the program. Now that this program will refer to the functionality offered in Complex.o, we must include the relevant header files, which is in this case Complex.h. This will ensure correct use of functionality supplied in Complex.o.
 * Once you get the program to compile you must provide the code that implements the functionality used. This functionality is delivered to you in the object module named Complex.o. All you have to do is to link this with the object code of your test program.
 * In addition to the object module, you must have access to the libraries and other object modules used in Complex.o and the program. In other words, we may not be able to test our program unless we have certain files. In our case these are the Standard C Library and the Math Library. Unless we have these libraries on our disk or the implementer supplies them to us, we will not be able to build the executable.