Programming Language Concepts Using C and C++/Introduction to Programming in C

What follows is an assortment of simple C programs presented to provide an exposure to the basics of the language. In order to better achieve this purpose, examples are kept simple and short. Unfortunately, for programs of such scale certain criteria become less important than they normally should be. One such criterion is source code portability, which can be ensured by checking programs against standards compliance. This criterion, if not pursued diligently from the start, is very difficult to fulfill.

For easier tackling of the problem, many compilers offer some assistance. Although the degree of support and its form may vary, it’s worth taking a look at the compiler switches. For the compilers we will be using in the context of this class an incomplete list of applicable options are given below.

C Preprocessor
The C preprocessor is a simple macro processor&mdash;a C-to-C translator&mdash;that conceptually processes the source text of a C program before the compiler proper reads the source program. Generally, the preprocessor is actually a separate program that reads the original source file and writes out a new "preprocessed" source file that can then be used as input to the C compiler. In other implementations, a single program performs the preprocessing and compilation in a single pass over the source file. The advantage of the former scheme is, apart from its more modular structure, the possibility of translators for other programming languages using the preprocessor.

The preprocessor is controlled by special preprocessor command lines, which are lines of the source file beginning with the character.

The preprocessor removes all preprocessor command lines from the source file and makes additional transformations on the source file as directed by the commands.

The name of the command must follow the  character. A line whose only non-whitespace character is a  is termed null directive in ISO C and is treated the same as blank line.

Defining a Macro

 * preprocessor command causes a name to become defined as a macro to the preprocessor. A sequence of tokens, called the body of the macro, is associated with the name. When the name of the macro is recognized in the program source text or in the arguments of certain other preprocessor commands, it is treated as a call to that macro; the name is effectively replaced by a copy of the body. If the macro is defined to accept arguments, then the actual arguments following the macro name are substituted for formal parameters in the macro body. Argument passing mechanism used is akin to call-by-name. However, one should not forget the fact that text replacement is carried out by the preprocessor, not the compiler.
 * preprocessor command causes a name to become defined as a macro to the preprocessor. A sequence of tokens, called the body of the macro, is associated with the name. When the name of the macro is recognized in the program source text or in the arguments of certain other preprocessor commands, it is treated as a call to that macro; the name is effectively replaced by a copy of the body. If the macro is defined to accept arguments, then the actual arguments following the macro name are substituted for formal parameters in the macro body. Argument passing mechanism used is akin to call-by-name. However, one should not forget the fact that text replacement is carried out by the preprocessor, not the compiler.

This macro directive can be used in two different ways.


 * Object-like macro definitions:  name sequence-of-tokens?, where ? stands for zero or one occurrence of the preceding entity. Put another way, the body of the macro may be empty.

Redefinition of a macro is allowed in ISO C provided that the new definition is the same, token for token, as the existing definition. Redefining a macro with a different definition is possible only if an  directive is issued before the second definition.


 * Defining Macros with Parameters:  name(name1, name2, ..., namen) sequence-of-tokens?. The left parenthesis must immediately follow the name of the macro with no intervening whitespace. Such a macro definition would be interpreted as an object-like macro that starts with a left parenthesis. The names of the formal parameters must be identifiers, no two the same. A function-like macro can have an empty formal parameter list. This is used to simulate a function with no arguments.

When a function-like macro call is encountered, the entire macro call is replaced, after parameter processing, by a processed copy of the body. The entire process of replacing a macro call with the processed copy of its body is called macro expansion; the processed copy of the body is called the expansion of the macro call. Therefore, with the above definition,  would be expanded to. Now that preprocessing takes place prior to compilation, the compiler proper does not even see the identifier increment in the source code. All it sees is the expanded form of the macro.

The problems we faced in the above example may be attributed to the textual nature of the parameter passing mechanism used: call-by-name: Each time the parameter name appears in the text, it is replaced by the exact text of the argument; each time it is replaced, it is evaluated once again.

Undefining a Macro

 * name
 * causes the preprocessor to forget any macro definition of name. It is not an error to undefine a name that is currently not defined. Once a name has been undefined, it may then be given a completely new definition without error.

Macros vs. Functions
Choosing macro definition over function makes sense when


 * efficiency is a concern,
 * the macro definition is simple and short, or used infrequently in the source, and
 * the parameters are evaluated only once.

Macro definition is efficient because preprocessing takes place before runtime. It actually takes place even before the compiler proper starts. On the other hand, function call is a rather expensive instruction and it is executed in the runtime. In this sense, a macro can be used to simulate inlining of a function.

Macro definitions are expected to be simple and short because replacing large bodies of text in the source will give rise to code-bloat, especially when it is used frequently.

Justification of the third requirement was given previously.

On the down side, it should be kept in mind that the preprocessor does not do any type-checking on macro parameters.

Code Inclusion

 * filename
 * filename
 * preprocessor-tokens
 * The  preprocessor command causes the entire contents of a specified source file to be processed as if those contents had appeared in place of the   command.

The general intent is that the filename  form is used to refer to the header files written by the programmer, whereas the  filename  form is used to refer to standard implementation files. The third form undergoes normal macro expansion and the result must match one of the first two forms.

Conditional Preprocessing

 * constant-expression
 * group-of-lines1


 * group-of-lines2
 * group-of-lines2


 * group-of-linesn
 * group-of-linesn
 * group-of-linesn


 * These commands are used together to allow lines of source text to be conditionally included or excluded from the compilation. It comes handy when we need to produce code for different architectures (e.g., Vax and Intel) or in different modes (e.g., debugging mode and production mode).
 * These commands are used together to allow lines of source text to be conditionally included or excluded from the compilation. It comes handy when we need to produce code for different architectures (e.g., Vax and Intel) or in different modes (e.g., debugging mode and production mode).


 * name
 * name
 * These two commands are used to test whether a name is defined as a preprocessor macro. They are equivalent to name  and  name, respectively.

Notice that  name and   name are not equivalent. Although they work exactly the same way when name is undefined, the parity breaks when name is defined to be 0. In that case, the former will be false while the latter will be true.


 * name
 * name
 * The  operator can be used only in   and   expressions; it is not a preprocessor command but a preprocessor operator.

Emitting Error Messages

 * preprocessor-tokens
 * The  directive produces a compile-time error message that will include the argument tokens, which are subject to macro expansion.

Compile-Time Switches
In addition to the use of the  command; we can use a compile-time switch to define a macro.

char Data Type
Problem: Write a program that prints out the characters 'a'..'z', 'A'..'Z', and '0'..'9' along with their encoding.

Encoding.c The following preprocessor directive pulls the contents of the file named stdio.h into the current file. Once this file is included, its contents are interpreted by the preprocessor. Included files generally contain declarations shared among different applications. Constant declarations and function prototypes are examples to these. In this case, we must include stdio.h in order to pull in the prototype of the  function.

A function prototype consists of the function return type, the name of the function, and the parameter list. It describes the interface of the function: it details the number, order, and types of parameters that must be provided when the function is called and the type of value that the function returns.

A function prototype helps the compiler ensure correct usage of a particular function. For instance, given the declaration in stdio.h, one cannot pass an int as the first argument to the printf function.

Time and again you will see the words prototype and signature used interchangeably. This however is not correct. Signature of a function is its list of argument types; it does not include the function return type.

All runnable C programs must contain a function named. This function serves as the entry point to the program and is called from within the C runtime, which is executed after the executable is loaded, as part of the initialization code.

The system software needed to copy a program from secondary storage into the main memory so that it’s ready to run is called a loader. In addition to bringing the program into the memory, it can also set protection bits, arrange for virtual memory to map virtual addresses to disk pages, and so on.

The formal parameter list of the following main function consists of the single keyword void, which means that it does not take any parameters at all. You will at times see C codes with   or    written instead of :. In this context all three serve the same purpose although the first two should be avoided. A function with no return type in its declaration/ definition is assumed to return a value of type int. A function prototype with an empty formal parameter list is an old style declaration that tells the compiler about the presence of a function, which can take any number of arguments of any type. It also has the unasked for side effect of turning off prototype checking from that point on. So, one should avoid such usages.

Although we will manipulate characters, the variable used to hold the characters,, is declared as an. The reasoning for this is as follows: Absence of the sign qualifier ( or  ) in the original language design gave compiler implementers the freedom to interpret   as a type comprising values in [0..255]&mdash;because characters can be indexed by nonnegative values only&mdash;or as a type comprising values in [–128..127]&mdash;because it is an integer type represented in a single byte. These two views basically limit the range of  to [0..127].

Due to the lack of exceptions in C, most functions dealing with characters and character strings need to represent exceptional situations, such as end-of file, as unlikely return values. This means, in addition to the legitimate characters, we should be able to encode the exceptional situations as different values. ASCII, this implies a representation that can hold 128 + n signed values, where n is the number of exceptional conditions to be dealt with. When taken together with the conclusion drawn in the previous paragraph, it can be seen that we need a representation that is larger than type. Therefore, you'll often see an  variable being used to hold values of type.

Encountering a character constant the C compiler replaces it with an integer constant that corresponds to the order of the character in the encoding table. For instance, 'a' in ASCII is replaced with 97, which is an integer constant of type.

Had we used  instead of   our program would still be working correctly. This is because we did not have to deal with any exceptional conditions in the program and all  constants used are within the limits of   representation. That is, all narrowing implicit conversions, as that would take place in the initialization statement of the  loop, are value preserving.

function is used to format and send the formatted output on the standard output file, which by default is the screen. It is a variadic function: that is, it takes a variable length argument list. The first argument is taken to be a format control string, which is used to figure out the type and number of arguments. This is accomplished through the use of special character sequences starting with. When encountered, such a sequence is replaced with the corresponding actual parameter, if the actual parameter can be used in the context implied by the character sequence. For instance,  in the following line means that the corresponding argument must be interpretable as a character. Likewise,  is a placeholder for a decimal number.

Note that  actually returns an. Not assigning this return value to some variable means that it is ignored.

Compiling and Running a Program in Linux Command Line
Given that this program is saved under the name Encoding.c, it can be compiled (and linked) using



invokes the GNU C/C++ compiler driver, which first gets the C-preprocessor process Encoding.c and passes the transformed source file to the compiler proper. The output of the compiler proper, an assembly program, is later passed to the assembler. The object code file assembled by the assembler is finally handed to the linker, which links it with the standard C library and stores the executable in a disk file whose name is provided with  option. Note that you do not have to tell the driver to link with the standard C library. This is a special case, though. With other libraries you must tell the compiler driver the libraries and object files to be linked with. This whole scheme basically creates a new external command named. Issuing the command



at the command line will eventually cause the loader to load the program from the secondary storage to memory and run it.

Compiling and Running a Program Using Emacs


This command will put you in the Emacs development environment. Click on File→Open... and pick Encoding.c from the file list. This will open up a new C-mode buffer within the current window. Note that the second line from the bottom shows (C Abbrev), which means Emacs has identified your source code as a C program. Next, click on Tools→Compile►→Compile.... This will prompt you to enter the command needed to compile (and link) the program. This prompt will be printed in an area called the minibuffer, which is normally the very last line of the frame. Erase the default selection and write



As you hit the enter key, you will see a *compilation* buffer pop up that lets you know about how the compilation process is proceeding. Hoping you don’t make any typos and everything goes smoothly, next thing we will do is to run the executable. In order to do this, click on Tools→Shell►→Shell. This will open a restricted shell inside a Shell mode buffer, from where you can run your executables. Type in



within this buffer and you will see the same output as you saw in the previous section.

In case you end up with compilation errors, clicking on an error line in the *compilation* buffer takes you to the relevant source line.

If you want to go back to the source code and make some changes, click on Buffers→Encoding.c. When you are through with the changes, you can compile the source code once again by clicking on Tools→Compile►→Repeat Compilation. This will recompile Encoding.c using the command entered above. However, in case you may want to modify the command, click on Tools→Compile►→Compile... and proceed as you did before.

Non-portable Version
Assuming ASCII, you may be tempted to replace all character constants in the program with the corresponding integer values. This is strongly discouraged since it will make the program non-portable.

ASCII_Encoding.c

The following line contains embedded constants, which cause the code to be non-portable. What if we wanted to move our code to some environment where EBCDIC is used to encode characters? So, one should avoid embedding such implementation dependent features into the programs and let the compiler do the dirty work.

Note also, since the same action is taken by the compiler, probable motivation&mdash;speeding up the program&mdash;for replacing character literals with integer constants is not well-founded, either.

The  function causes the program to terminate, returning the value passed to it as the result of executing the program. Same effect can be achieved by returning an integer value from the  function. By convention, a value of 0 signifies successful termination while a nonzero value is used to signify abnormal termination.

Compiling and Running a Program Using MS Visual C/C++
First of all, make sure you execute vcvars32.bat, which you can find in the bin subdirectory of the MS Visual C/C++ directory. This will set some environment variables you need for correct operation of the command line tools.



Similar to, this command will go through preprocess, compile, assemble, and link phases. Upon successful completion, we can run our program simply by issuing the name of the executable filename at the command line. The operating system shell will recognize the resultant executable as an external command and get the loader to load the Ascii_Enc.exe into memory and eventually run it.



Compiling and Running a Program Using DGJPP-rhide
Start a new DOS box and enter the following command.



This will start a DOS-based IDE that you can use to develop projects in different programming languages. Choose Compile→Make or Compile→Build All or Compile→Compile followed by Compile→Link. This will compile the source code and link the resulting object module with the C runtime. You can now run the executable by clicking on Run→Run or by exiting to DOS by choosing File→DOS Shell and entering the filename at the prompt. (In case you may see unexpected behavior from rhide, make sure the file is not too deep inside the directory hierarchy and the names do not contain special characters such as space.) If you choose the second option you can return to rhide by typing  at the command prompt.

Macros
Problem: Write a program that prints a greeting in English or Turkish. The language should be chosen by a compile-time switch. The name(s) of the person(s) to be greeted is passed to the program through a command-line argument.

Greeting.c

The following line checks to see whether a macro named  has already been defined or not. Definition of this macro, or any macro for that matter, can be made within a file or at the command line prompt as a compiler switch. In this example, there is no such definition in the file or any included file. So, absence of such a macro as a compiler switch will cause the control jump to the  part and the statements between   and   are included in the source file. Given the definition is made at the command line prompt, the statements between  and   are included in the source file. Whichever section of code is included, one thing is certain: either the part between  and   or the part between   and   is included, not both; there is no danger of duplicate definition.

Notice the peculiar way of naming the variables. This is the so-called Hungarian notation. By prefixing specially interpreted characters to the identifier name, this method aims at providing the fellow programmers/maintainers as much context information as possible. Without any reference to the definition of an identifier, which can be pages apart or even in a different source file that is inaccessible to us, we can now garner the required information simply by interpreting the prefix. For instance,  is meant to be a string ending with zero (that is, a C-style string).

C does not allow the programmer to overload function names. main function is an exception to this: it comes in two flavors. The first one, which we have already seen, does not take any arguments. The second one permits the user to pass command line arguments to the application. These arguments are passed in a vector of pointers to character. If the user wants the arguments to be interpreted as belonging to some other data type, the application code has to do some extra processing.

In C, there is no standard way of telling the size of an array. One should either use a convention or hold the size in another variable. Character strings in C, which may be considered as an array of characters, are an example to the former. Here, a sentinel value (NULL character) is used to signify the end of the string. For most instances, such a scheme turns out to be impossible or infeasible. In that case we need to hold the size (or length) information in a separate variable. Hence is the need for a second variable.

The program name is the first component of the argument vector. So, if the argument count is one it means the user has not passed any arguments at all.

Execution of a  statement will terminate the innermost enclosing loop (,  , and  ) or   statement. That is, it will basically jump to the next line following the loop or  statement. In our case, control will be transferred to the  statement.

Newcomers to C from a Pascal-based background must be careful about the nature of the  statement: Unlike the   statement, where each branch is executed mutually,   lets more than one branch be executed. If that’s not what you want you must delimit the branches with  statements, as was done below. Without the  statements in place an argument count of one would cause all three messages to be printed. Similarly, an argument count of two would print anonymous message together with the appropriate one.

Using Compiler Switches
Saving this C program as Greeting.c and issuing



at the command line will produce an executable named Gunaydin. This executable will not include object code for any of the statements between  and. Similarly, if we compile the program using



we will get an executable named GoodMorning with the statements between  defined and   excluded. Note that with no macros defined at the command line, the English version of the greetings will be included.

One should not mistake compile-time switches for command-line arguments. The former is passed to the preprocessor and used to alter the code to be compiled, while the latter is passed to the running program to alter its behavior. Provided that we build our executables as shown above



will produce as output


 * Gunaydin, Tevfik

whereas



will produce


 * Good morning to you all!

Pointer Arithmetic
Problem: Write a program to demonstrate the relationship between pointers and addresses.

Pointer_Arithmetic.c

The third part in the following  loop increments the variable , which is a pointer to. If we fail to realize that address and pointer are two different things, we might be tempted to think that what we do is simply incrementing the address value. But this would be utterly wrong. Although, for pedagogical purposes, we may assume a pointer to be an address, they are not one and the same thing. When we increment a pointer, the address value held in it is incremented by as much as the size of the type of value pointed to by the pointer. However, as far as pointer to  is concerned, incrementing a pointer and address has the same effect. This is due to the fact that a  value is stored in a single byte.

Definition: A pointer is a variable that contains the address of a variable, content of which is interpreted to belong to a certain type.



Note that although a handle on an object may be regarded as a "kind of" pointer, they are two different concepts. Along with other differences such as support for inheritance, unlike pointers and addresses, handles do not take part in arithmetic operations.

Next line is an example showing the difference between a pointer and an address. Here, incrementing  will increase the value of the address held in it by the size of an.



Next line creates and initializes an array of s. The compiler computes the size of this array. What the compiler does is basically count the number of characters in between double quotes and reserve memory accordingly. Note that the compiler automatically appends a  character too. However, if we choose to use aggregate initialization, we have to be a bit more cautious.



will create an array of 5 characters, not 6. There won’t be any  character at the end of the character array. If that’s not what we really want we should either add the  character in initialization as in



or revert back to the former style. In both cases, the array is allocated in the runtime stack.

Note also the use of escape sequences for embedding double quotes in the character string. Since it is used to flag the end it's not possible to insert '”' in a string literal. The way out of this is by means of escaping '”', which tells the compiler that the following '”' does not end the character string, it should be literally embedded in the string.

Third argument of the next  statement is a call to a function that returns an. What’s more interesting with this call is the way its argument is passed. Although we seem to be passing an array, what is really passed behind the scenes is the address of the first component of the array; that is,. This is done regardless of the array size. Advantages of such a scheme are:


 * 1) All we need to pass is a single pointer, not an entire array. As the array size grows, we save more on memory.
 * 2) Now that we pass a single pointer we avoid copying the entire array. This means saving both on time and memory.
 * 3) The changes we make in the array are permanent; the caller sees the changes made by the callee. This is due to the fact that it is the pointer that is passed by value, not the array. So, although we cannot change the address of the first component of the array, we can modify the contents of the array.

The down side of it is that you need to make a local copy of the array if you want the array to remain the same between calls.

Bit Manipulation
Originally a systems-programming language, C offers assistance for manipulation of data on a bit-by-bit basis. This comprises bitwise operations and a facility to define structure with bit fields.

Not surprisingly, this aspect of the language is utilized in machine-dependent applications such as real-time systems, system programs [device drivers, for instance] where running speed is a primary concern. Given the sophisticated optimizations done by today's compilers and the non-portable nature of bit manipulation, in the presence of an alternative their use in general-purpose programming should be avoided.

Bitwise Operations
Problem: Write functions to extract exponent and mantissa of a  argument.

Extract_Fields.c

Specification of C does not standardize (with the exception of type char) the size of integer types. The only guarantee given by the language specification is the following relation:


 * &le;  &le;   &le;

On most machines an  takes up four bytes, whereas on some it takes up two bytes. A menifestation of C's system programming orientation, this variance is due to the fact that size of  was meant to be equal to the word size of the underlying architecture. As a consequence of this, if we take it for granted that four bytes are used to represent an  value, we may occasionally see our programs producing insensible output. This will be due to the fact that an  value representable in four bytes will probably give rise to an overflow when represented in two bytes.

The following  directive is used to circumvent this problem. , defined in limits.h, holds the largest possible unsigned  value. On machines where an  value is stored in two bytes, this will be equal to 65535. Otherwise, it will be something else. So, if  happens to be 65535 we can say an int is represented in two bytes. If not, it is represented in four bytes.

Next function pulls the exponent part of a particular  variable. This is done by isolating the exponent part and shifting the resulting value to right. We use bitwise-and operator (binary &) to isolate the number and shift-right (>>) this isolated exponent (in a way right-adjust the number).

Note that result of right-shifting a negative integer&mdash;that is, a number with a bit pattern having 1 in the most significant bit&mdash;is undefined. In other words, the behavior is implementation dependent. While in some implementations the sign bit is preserved and therefore right-shifting is effectively sign-extending the number, in others this bit is replaced with a 0.

Partial image of memory at this point is given in Figure 3. If, say,  has -1.5 as its value, it will be interpreted as -1.5 when seen through. That is,  will be -1.5. However, if seen through, it will be interpreted to contain 3,217,031,168! The difference is due to different ways of looking at the same thing: While  sees   as a   value compliant with the IEEE 754/854,   sees the same four bytes of memory as an integer value (of type   or , depending on the machine the program is running on) encoded using 2’s complement notation.

(insert figure here)

The following line first uses a bit mask to isolate the exponent part and then shifts it to right so that the exponent bits occupy the least significant numbers.

(insert figure here)

Next function extracts the mantissa part of the number. It does this simply by masking out the sign and exponent parts of the number. This is accomplished by the bitwise-and operator in the return expression. Note that we do not need to shift the number since its mantissa is made up of the lowest 23 digits of the representation.

The following function tries to make sense of the command line arguments. Number of such arguments being one means the user has not passed any numbers. We display an appropriate message and exit the program. If the argument count is two the second one is taken to be the number whose components we will extract. Otherwise, the user has passed more arguments than we could deal with; we simply let her know about it and exit the program.

In case of failure it’s a recommended practice to exit with a nonzero value. This may turn out to be of great help when programs are run in coordination by means of a shell script. If a program depends on the successful completion of another, the script needs to have a reliable way of checking the result of the previous programs. A program exiting with non-descriptive values is of no help in such situations. We have to make sure that when we return with zero, it really means successful execution. Otherwise there is something wrong, which may further be elucidated by different exit codes.

function is used to convert a numeric string, which may also contain -/+ and e/E, to a floating-point number of type. While it’s easy to figure out the function of the first argument, which is a pointer to the string to be converted, one cannot say the same thing about the second argument. Upon return from  the second argument will hold a pointer to the character following the converted part of the input string. Thanks to this, we can process the rest of the string using  and its friends.

Parse.c


 * }


 * gcc -o Parse.exe Parse.c↵
 * &num; Double-quotes are needed to treat the string as a single argument. It is not a part of the argument string and will be stripped before passing to main!
 * Parse "Selin Yardimci: 80, 90, 100"↵
 * Name: Selin Yardimci
 * Midterm: 80   Final: 90    Assignment: 100

Observe the simplicity of the  function. We just state what the program does; we do not delve in the details of how it does that. Reading the  function, the maintainer of this code can easily figure out what it claims to do. In case she may need more detail, she has to check the function bodies. Depending on the complexity of the program, these functions will provide full details of the implementation or defer this provision to another function. In a complex program this deferment can be extended to multiple levels.

However simple or complex the problem at hand might be and whichever paradigm we use, we start by answering the question "What" and then move (probably in degrees) on to answering the question "How". In other words, we first analyze the problem and come up with a design for the solution and then provide an implementation. Our code should reflect this: it should first expose the answer to "what" (interface) and then (to the interested parties) the answer to "how" (implementation).

Bit Fields
Problem: Using bit fields, implement the previous problem.

Extract_Fields_BF.c

Bit fields are defined similar to ordinary record fields. The only difference between the two is the width specification following the bit field. According to the definition below fraction, exponent, and sign occupy twenty-three, eight, and one bit, respectively. However, rest is pretty much up to the compiler implementation.

To start with, as is manifested by the preprocessor directive, memory layout of a variable of type  depends on the processor endianness. Manner of bit packing is also not guaranteed to be the same among different hardware. These two factors effectively limit the use of bit fields to machine-dependent programs.

There are two field access operators in C:  and. The former is used to access fields of a structure, whereas the latter is used to access fields of a structure through a pointer to the structure. Now that  is defined to be a pointer to the   structure, all variables declared to be of type SINGLE_FP can access the bit fields through the use of ->.

Observe  is equivalent to. For example,  is equivalent to

Static Local Variables (Memoization)
Problem: Using memoization write a low cost recursive factorial function.

Reminiscent of caching memoization can be used to speed up programs by saving results of computations that have already been made. The difference between the two lies in their scopes. When we speak of caching we speak of a system- or application-wide optimization technique. On the other hand, memoization is a function-specific technique. When a request is received we first check to see whether we can avoid computing the result from scratch. Otherwise, computation is carried out from scratch and the result is added to our database. In other words, we prefer computation-in-space over computation-in-time and save some precious computer time.

Applying this technique to the problem at hand we will store the set of already computed factorials in a  local array. Now that any changes made to this array will be persistent between different calls, basis condition for recursion will be changed to reaching a computed factorial, not an argument value of 1 or 0. That is, we have



Memoize.c

As soon as the program starts running the following initializations will have taken effect. As a matter of fact, since  local variables are allocated in the static data region initial values for them are present in the disk image of the executable.

Mark the initializer of the array used for storing the already made computations. Although it has  components only two values are provided in the initializer. The remaining slots are filled with the default initial value 0. In other words, it is equivalent to



Note that we could provide more initial values to avoid more of the initial cost.



If we have already computed the factorial of a number that is equal to or greater than the current argument value, we retrieve this value and return it to the caller.

Note that receiver of the returned value can be either the  function or another invocation of the factorial function. In the second case, we do part of the computation and use some partial result from the previous computations.

Once new values are computed it is stored in our array and the largest argument value for which factorial has been computed is appropriately updated to reflect this fact.

preceding the conversion letter specifies that input is expected to be a. Similarly, one can use  to specify a.

Like    is used to alter the flow of control inside loops; it terminates the execution of the innermost enclosing ,  , and   statements. In our case, control will be transferred to the beginning of the  statement.


 * gcc –o MemoizedFactorial.exe Memoize.c↵
 * MemoizedFactorial↵
 * Enter a negative value for termination of the program
 * Enter an integer in [0..20]: 1
 * 1! is 1
 * Enter an integer in [0..20]: 5
 * N:5   N:4    N:3    N:2    5! is 120
 * Enter an integer in [0..20]: 5
 * 5! is 120
 * Enter an integer in [0..20]: 10
 * N:10   N:9    N:8    N:7    N:6    10! is 3628800
 * Enter an integer in [0..20]: -1

File Manipulation
Problem: Write a program that strips comments of a C program.

Strip_Comments.c

When an identifier definition is qualified with, it is taken to be immutable. Such an identifier cannot appear as an lvalue. This means we cannot declare an identifier to be constant and subsequently assign a value to it; a constant must be provided with a value at its point of declaration. In other words, a constant must be initialized.

Initializer of a global constant can contain nothing but subexpressions that can be evaluated at the compile-time. A local constant on the other hand can contain run-time values. For instance,




 * }

will produce the following output:


 * i: 15
 * i: 30
 * i: 18
 * i: 39

This goes to show that constants are created for each function invocation and they can get different values. But they still cannot be modified throughout the function call.

In C, all identifiers must be declared before they are used! This includes file names, variables, and structure tags. The corresponding definition can be provided after the declaration in the same file or in a different one. While there can be more than one declaration, there can be only one definition.

Following declarations are prototypes of functions whose definitions are provided later in the program. Note the parameter names do not agree with the parameter names provided in the definition. As a matter of fact, one does not even have to provide the names. But, you still have to list the parameter types so as to facilitate type checking: It is an error if the definition of a function or any uses of it do not agree with its prototype.

The use of prototypes can be avoided by rearranging the order of definitions. For this example, putting the  function at the end would remove the need for the prototypes.

Comma-separated expressions are considered a single expression result of which is the result returned by the last expression. Evaluation of a comma-separated expression is strictly from left to right; the compiler cannot change this order.

A more general version of,   performs output formatting and sends the output to the stream specified as the first argument. We can therefore see  as an equivalent form of the following:



Another friend of  that you may want to consider in output formatting is the   function. This function, instead of writing the output to some media, causes it to be stored in a string of characters.




 * }

The following statement tries to open a file in reading mode. It maps the physical file, whose name is held in variable, to the logical file named. If this attempt results in success every operation you apply on the logical file will be executed on the physical file. variable can be thought of as a handle on the real file. The mapping between the handle and the physical file is not one-to-one. Just like more than one handle can show the same object, one can have more than one handle on the same physical file. There is no problem as long as all handles are used in read mode. But things get ugly if different handles simultaneously try to modify the same file. [The keywords are operating systems, concurrency, and synchronization.]

If the open operation fails we cannot get a handle on the physical file. That is reflected in the value returned by the  function:. A pointer having a  value means that we cannot use it for further manipulation. All we can do is to test it against. So, we check for this condition first. Unless it is  we proceed; otherwise we write something about the nature of the exceptional condition to the standard error file   and exit the program.

Just like the standard output, the standard error file is by default mapped to the screen. So, why do we write to  instead of  ? The answer is, we may choose to re-map these standard files to different physical units. In such a case if we kept writing everything to the same logical file, say, errors would clutter valid output data; we would have difficulty telling which one is which.

The solution to the problem can be modeled using the following finite automaton.



Note the ease of transforming the FA-based solution to C code. This is yet another example of how useful, however useless and boring it might look at first, such a theoretic model might be.

A problem is solved by transforming its problem domain representation to the corresponding solution domain representation. The more one knows about models to represent a problem the more easily she will come up with the solution of the problem.

disconnects the logical file(handle) from the physical file. If not done by the programmer, code found in the exit sequence of the C runtime guarantees that all opened files are closed at the end of the program. Leaving it to the exit sequence has two disadvantages, though.
 * 1) There is a maximum number of files that an application can have open at the same time. If we defer closing of the files to the exit sequence, we may reach this limit more frequently and quickly.
 * 2) Unless you explicitly flush using fflush, data you write to a file is actually written to a buffer in memory, not to disk. It is flushed automatically whenever you write a newline to the file or the buffer area becomes full. It means that if there occurs a catastrophic failure, such as an outage, between the earliest time you could close the file and the time exit sequence closes it at the end of the program, the data left in the buffer area will not have been committed to the disk. Not a perfect success story!

So you should either flush explicitly or close the file as early as possible.

Heap Memory Allocation
Problem: Write an encryption program that reads from a file and writes the encoded characters to some other file. Use this simple encryption scheme: The encrypted form of a character  is , where   is a string passed as a command line argument. The program uses the characters in key in a cyclic manner until all the input has been read. Re-encrypting encoded text with the same key produces the original text. If no key or a null string is passed, then no encryption is done.

Cyclic_Encryption.c

is used to allocate storage from the heap, the area of memory that is managed by the programmer herself. This implies that it is the responsibility of the programmer to return every byte of memory allocated through  to the pool of available memory.

Such memory is indirectly manipulated through the medium of a pointer. Different pointers can point to the same region of memory. In other words, they can share the same object.

Using pointers to manipulate heap objects does not mean that pointers can point only to heap objects. Nor does it mean that pointers themselves are allocated in the heap. Pointers can point to non-heap objects and they can reside in the static area or the run-time stack. Indeed, such was the case in the previous examples.

The following line opens a file that can be read byte-by-byte (binary mode), not character-by-character (text mode). In an operating system like UNIX, where each and every character is mapped to a single entry in the code table, there is no difference between the two. But in MS Windows, where newline is mapped to carriage return followed by linefeed, there is a difference, which may escape incautious programmers. The following C program demonstrates this. Run it with a multi-lined file and you will see that the number of 's needed will be different. The number of characters you read will be less than the number of bytes you read. This discrepency, which is a consequence of the difference in the treatment of the newline character, is likely to cause a great headache when you move your working code from Linux to MS Windows. Newline.c


 * }

Text.txt


 * }

If you compile and run the program listed on the previous page in a Unix-based environment, it will produce identical outputs for both (text and binary) modes. For MS Windows and DOS environments they will produce the following output.


 * 1.ch: 70 ...
 * 11.ch: 10 12.ch: 83 ...
 * ... 54.ch: 101
 * 1.ch: 70 ...
 * 11.ch: 13 12.ch: 10 13.ch: 83 ...
 * ... 57.ch: 101
 * ... 54.ch: 101
 * 1.ch: 70 ...
 * 11.ch: 13 12.ch: 10 13.ch: 83 ...
 * ... 57.ch: 101
 * ... 57.ch: 101
 * ... 57.ch: 101
 * ... 57.ch: 101

Pointer to Function (Callback)
Problem: Write a generic bubble sort routine. Test your code to sort an array of s and character strings.

General.h

The following typedef statement defines  to be a pointer to a function that takes two arguments of   (that is  ) type and returns an. After this definition, in this header file or any file that includes this header file, we can use  just like any other data type. That actually is what we do in Bubble_Sort.h and Bubble_Sort.c. Third argument of the  function is defined to be of type. That is, we can pass address of any function conforming to the prototype stated below.

is used as a generic pointer. In other words, data pointed to by the pointer is not assumed to belong to a particular type. In a sense, it serves the same purpose the  class at the top of the Java class hierarchy does. Just as any object can be treated to belong in the  class, any value, be that something as simple as a   or something as complex as a database, can be pointed to by this pointer. But, such a pointer cannot be dereferenced with the  or subscripting operators. Before using it one must first cast the pointer to an appropriate type.

Bubble_Sort.h

function sorts an array of s (first argument), whose size is provided as the second argument. Now that there is no universal way of comparing two components and we want our implementation to be generic, we must be able to dynamically determine the function that compares two items of any type. Dynamic determination of the function is made possible by using a pointer to a function. Assigning different values to this pointer enables using different functions, which means different behavior for different types. Coming up with parameter types that are common to all possible data types is the key to making this pointer to function work for all. For this reason,  takes two arguments of type , that is  , which can be interpreted to belong to any type.

Bubble_Sort.c

The  qualifier in the following definition limits the visibility of   to this file. In a sense, it is what  qualifier is to a class in OOPL’s. The difference lies in the fact that a file is an operating system concept (that is, it is managed by OS), while a class is a programming language one (that is, it is managed by the compiler). The latter is certainly a higher-level abstraction. In the absence of a higher-level abstraction, it (higher-level abstraction) can be simulated using lower level one(s). Intervention of an external agent is required to provide this simulation, which gives rise to a more fragile solution. Such is the case in this solution: the programmer, the intervening agent, has to simulate this higher-level of abstraction through using some programming conventions.

We could have written the following line as follows:



and it would still do the same thing. So, a pointer-to-function variable can be used just like a function: simply use its name and enclose the arguments in parentheses. This will cause the function pointed to by the variable to be called. Nice thing about calling a function through a pointer is the fact that you can call different functions by simply changing the value of the pointer variable. If you pass the address of compare_ints to the  function, it will call  ; if you pass the address of , it will call  ; if ...

Pointer2Function.c

The following line brings in the prototype of the  function, not the source code or the object code for it.

The compiler utilizes this prototype in type-checking the function use. This involves checking the number of parameters, their types, whether it [the function] is used in the right context. As soon as this is confirmed, the linker takes over and [if it is in a separate source file] brings in the object code for bubble_sort. So,


 * 1) Preprocessor preprocesses the source code by replacing directives with C source code.
 * 2) Compiler, using the provided meta-information (such things as variable declarations/definitions and function prototypes), checks the syntax and semantics of the program. Note that by the time the compiler takes over, all preprocessor directives will have been replaced with C source. That is, the compiler does not know anything about the preprocessor.
 * 3) Linker combines object files to a single executable file. This executable is later loaded into memory by the loader and run under the supervision of the operating system.

Next two functions are needed for making comparisons in the implementation of the sorting algorithm. They compare two values of the same type.

Implementer of the  function cannot know in advance the countless number of object types to be sorted and there is no universal way of comparing that works for all types. So, user of the sorting algorithm must implement the comparison function and let the implementer know about this function. Implementer makes use of this function to successfully provide the service. In doing this it calls “back” to the user’s code. This type of a call is therefore named a callback.

Looks like a very complicated way of comparing two s? That’s right! But remember: we have to be able to compare two objects (values) of any type and we must do it with one single function signature. Comparison of two s is straightforward: just compare the values. But, how about character strings? Comparing the pointers will not yield an accurate result; we must compare the character strings pointed to by the pointers. It gets even more complex as we compare two structures that have nested structures in themselves. The solution to this is to pass the ball to the person who knows it best (the user of the algorithm) and while doing so pass a generic pointer to the data, not the data itself. In the process, the user will cast it into an appropriate type and make the comparison accordingly.

The third argument passed to  function is a pointer-to-function. This pointer contains a value that can be interpreted as the entry point of a function. Conceptually, there is not much of a difference between such a pointer and pointer to some data type. The only difference is in the memory region they point to. The latter points to some address in the data segment while the former to some address in the code segment. But one thing remains the same: the address value is always interpreted [in a particular way].

Note that you do not have to apply the address-of operator when you send a function as an argument.

Linking to an Object File


adds  to the head of the list of directories to be searched for header files, which initially include the directory of the source file and the directories found in. Analogous to  of Java, it is used in organizing header files. Using this information, the preprocessor brings  into the current file (Bubble_Sort.c). Once the preprocessor is through with its job, compiler will try to compile the resulting file and output an object file, named Bubble_Sort.obj.



The above command compiles Pointer2Function.c as was explained in the previous paragraph. Once Pointer2Function.obj is created it is linked with Bubble_Sort.obj to form the executable named Test_Sort.exe. This linking is required to bring in the object code of bubble_sort function. Remember: inclusion of Bubble_Sort.h brought in the prototype, not the object code!

Modular Programming
Problem: Write a (pseudo) random number generator in C. It is guaranteed that only one generator is used by a particular application and different applications will be using it over and over again.

Now that our generator will be used by different applications, we had better put it in a separate file of its own so that, by linking with the client program, we can use it from different sources. This is similar to, if not same with, the notion of module supported in languages like Modula-2. The difference has to do with their abstraction levels: the module is an entity provided by the programming language and known to all of its users, while file is an entity provided by the operating system and known to all of its users. Programming language compiler (that is, an implementation of the programming language specification) being a user of OS-provided services and concepts means module concept is a higher abstraction.

Now that the module concept (higher abstraction) is not present in C, we need to simulate it using other, probably lower-level, abstraction(s). In this case, we use a file (lower abstraction). By doing so, we cannot regain all of what comes with a module, though. The notion of module remains unknown to the compiler; certain rules reinforced by the compiler in a modular programming language must be checked by the programmer herself/ himself. For example, interface and implementation of a module must be synchronized by the programmer, which is certainly an error-prone process.

None of the applications being expected to use more than one generator means that we can create the data fields related to the generator in the static data region; in order to identify which generator the function is acting on, we do not need to pass a separate, unique handle. This implies we do not need any creation or destruction functions. All we need to have is a function for initializing the generator and another for returning the next (pseudo) random number.

RNGMod.h

RNGMod.c

Generating a (pseudo) random number is similar to iterating through a list of values: we basically start from some point and move through the values one by one. The difference lies in the fact that the values generated are to be computed using a function rather than retrieved from some memory location. In other words, a generator iterates through a list using computation in time while an iterator does this using computation in space.

Seen from this perspective, initializing a (pseudo) random number generator with a seed is analogous to creating an iterator over a list. By using different values for the seed we can guarantee the generator returns a different value. Using the same seed value will give the same sequence of (pseudo) random values. Such a use may be wanted for replaying the same simulation scenarios for different parameters.

The following function computes the next number in the sequence. The number being computed using a well-known formula means that the sequence is actually not random. That is, it can be known beforehand. This is why such a number is usually qualified with the word ‘pseudo’.

What makes the sequence look random is the number of different values it produces in a row. Formula used in the following function will produce all values between 1 and m – 1 and then repeat the sequence. More than two billion values!

In practice, it doesn’t make sense to use these large numbers. We therefore choose // to normalize the value into [0..1).

RNGMod_Test.c

Building a Program Using make
As the number of files needed to compile/link a program increases, it becomes more and more difficult to track the interdependency between the files. One tool to tackle with such situations is the make utility found in UNIX environments. This utility automatically determines which pieces of a large program need to be recompiled and issues commands to recompile them.

Input to make is a file consisting of rules telling which files are dependent on which files. These rules generally take the following shape:

The preceding rule is interpreted as follows: if the target is out of date use the following commands to bring it up to date. A target is out of date if it does not exist or if it is older than any of the prerequisites (by comparison of last-modification times).

Makefile

The following rule tells the make utility that RNGMod_Test.exe depends on RNGMod.o and RNGMod_Test.c. RNGMod_Test.exe is out of date if it does not exist or if it is older than RNGMod.o and RNGMod_Test.c. If any one of these two files is modified we have to update RNGMod_Test.exe by means of the command supplied in the next line.

Note the tab preceding the command is not there to make the file more readable; commands used to update the target must be preceded by a tab.

is a special variable used to denote the target filename.

.PHONY is a predefined target used to define fake goals. It ensures the make utility that the target is not actually a file to be updated but rather an entry point. In this example, we use clean as an entry point enabling us to delete all relevant object files in the current directory.

As soon as we save this file, all we have to do is to issue the make command. This command will look in the current directory for a file named makefile or Makefile. If you are using GNU version of make, GNUmakefile will also be tried. Once it finds such a file, make tries to update the target file of the first rule from the top.


 * make↵
 * gcc –c –ID:\include RNGMod.c
 * gcc –o RNGMod_Test.exe –ID:\include RNGMod.o RNGMod_Test.c
 * Time: 2.263 seconds
 * RNGMod_Test↵
 * Testing RNGMod...
 * Before initialization: 0.458724
 * After initialization: 0.273923   0.822585    0.186277    0.754617

Note that the time it takes the make utility to complete its task may vary depending on the processor speed and its load. In case only RNGMod_Test.c may have been modified we will see the following output.


 * make↵
 * gcc –o RNGMod_Test.exe –ID:\include RNGMod.o RNGMod_Test.c
 * Time: 1.673 seconds

We may at times want the make utility to start from some other target. In such a case we have to provide the name of the target as a command line argument. For instance, if we need to delete the relevant object files found in the current directory, issuing the following command will do the job.


 * make clean↵
 * del RNGMod.o RNGMod_Test.exe
 * Time: 0.171 seconds

Above presentation is a rather limited introduction to the make utility. For more, see Manual of GNU make utility.

Object-Based Programming (Data Abstraction)
Problem: Write a (pseudo) random number generator in C. Your solution should enable a single application to use more than one generator (with no upper bound to this number) at the same time. We should also meet the requirement of the generator being used from different applications.

As was mentioned in the [#Modular Programmingmodular|programming section], second requirement can be met by providing the generator in a separate file of its own.

In order to enable the use of more than one generator, where the exact number is not known, we must devise a method to create the generators dynamically as needed (similar to what constructors do in OOPL’s). This creation scheme must also give us something to uniquely identify each generator (similar to handles returned by constructors). We should also be able to return generators that are not needed anymore (similar to what destructors do in OOPL’s without automatic garbage collection).

RNGDA.h

As stated in the requirements we have to come up with a scheme that lets users have more than one generator coexisting in the same program. We should in some way be able to identify each and every one of these generators and tell it from the others. This means that we cannot utilize the same method we used in the previous example. We have to get related functions to behave differently depending on the state of the particular generator. This difference in behavior can be achieved by passing an extra argument, a handle on the present state of the generator. This handle should hide the implementation details of the generator from the user; it should have immutable characteristics that we can use to hide the mutable properties of the generator. Sounds like the notion of handle in Java? That’s right! Unfortunately, C does not have direct support for handles. We need some other, probably less abstract, notion to simulate it. This less abstract notion we will use is the notion of pointer: whatever the size of the data it is a handle on, its size does not change.

In the following lines we first make a forward declaration of a  named   and then define a new type as a pointer to this not-yet-defined. By using forward declarations, we do not betray any of the implementation details; we just let the compiler know that we have the intention of using such a type. By defining a pointer to this type, we put a barrier between the user and the implementer: the user has something (pointer, something size of which does not change) to access a generator (underlying object, the size of which can be changed by implementation decisions) indirectly. Freedom of change means the generator can be used without any reference to the underlying object, which is why we call this approach data abstraction any type defined in such a way abstract data type. Note the pointer argument passed to the function. Function behavior changes depending on the underlying object, which is indirectly accessed by means of this pointer. That’s why this style of programming is called object-based programming.

RNGDA.c

RNGDA_Test.c