Visual Basic/Effective Programming

When programming effectively in any computer language, whether it is VB or C++ for example, there should be consistency to your style, it should be organized, and should aim to be as efficient as possible in terms of speed of execution and use of resources (such as memory or network traffic). With established programming techniques, errors can be reduced to a minimum and be more easily recognized, and it will make the job of the programmer much easier and more enjoyable.

There are many different aspects to writing reliable programs. For a short program used interactively and only by the author it can be reasonable to break all the rules in the service of quickly getting the answer. However, if that little programs grows into a large program you will end up wishing that you had started on the right path. Each programming language has its strengths and weaknesses and a technique that assists in writing good programs in one might be unnecessary, impossible or counterproductive in another; what is presented here applies to VB6 in particular but much of it is standard stuff that applies or is enforced in Pascal, C, Java and other similar imperative languages.

General Guidelines
These suggestions will be described in greater detail further below. None of these can be called rules and some are controversial, you have to make up your own mind based on the costs and benefits


 * Write comments that explain why you do what you do. If the code or the problem being solved is especially complex you should also explain why you chose the method you did instead of some other more obvious method.


 * Indent your code. This can make reading the code for others easier and makes it easy to spot where statements have not been closed off properly. This is especially important in multiply nested statements.


 * Declare all variables, enforce this by placing Option Explicit at the top of every code module,


 * Use meaningful variable and sub routine names. The variable FileHandle means a lot more to us humans than X. Also avoid the tendency of abbreviating names as this can also make it hard to read code. Don't use FilHan where FileHandle would be clearer,


 * In the argument list of functions and subs declare all arguments as ByRef. This forces the compiler to check the datatypes of the variables you pass in,


 * Declare variables, subs, and functions in the smallest possible scope: prefer Private over Friend and Friend over Public.


 * Have as few variables as possible declared Public in .bas modules; such variables are public to the whole component or program.


 * Group related functions and subs together in a module, create a new module for unrelated routines,


 * If a group of variables and procedures are closely related consider creating a class to encapsulate them together,


 * Include assertions in the code to ensure that routines are given correct data and return correct data,
 * Write and execute tests,


 * Make the program work first and work fast afterwards


 * Where a variable can hold a limited range of discrete values that are known at compile time use an enumerated type,


 * Break large programs into separate components (DLLs or class libraries) so that you can reduce the visibility of data and routines, to just those other pieces of code that need to use them,


 * Use a simple prefix notation to show the type of variables and the scope of routines.

Declaring variables
Earlier in this book, you may have been taught to declare variables with a simple Dim statement, or not at all. Declaring variables on different levels is a crucial skill. Think of your program as three branches: The module (open to all forms), individual forms, and the sub programs themselves. If you declare a variable in your module, the variable will retain its value through all forms. A dim statement will work, but it is tradition to use "Public" in its place. For example:

Declaring something at the top of your form code will make it private to that form, therefore, if you have X=10 in one form, and X=20 in another, they will not interfere. If the variable was declared public, then there would be interactions between the values. To declare something in your form, it is traditional to use "Private".

And finally, there are the subprograms. Dimensioning variables only to a subprogram is highly effective, this way you can use default variables (such as sum for sums) in all subs without the need to worry about one value changing because of another section of code. There is, however, a twist. Dim, what you are accustomed to, will not retain the value of the variable after the sub is done. So after rerunning the sub, all the local variables in the sub will be reset. To get around this, a "Static" may be used.

Some of you may want to take the easy way out, and just use Public on everything. However, it is best to declare something on the smallest level possible. Generally arguments are the best way to send variables from one sub to the other. This is because arguments make it far easier to track exactly where variables are being changed in case of a logic error, and almost limits the amount of damage a bad section of code can do. Declaring variables is, again, useful when using default variables. Common default variables are:

I for loops J for loops Sum(self explanatory) X for anything

So, rather then making variables I and II or Sum1,Sum2, you can see why keeping variables local is a useful skill.

Comments
Every programmer I know dislikes writing comments. I don't mean just illiterate script-kiddies writing some brain dead Visual Basic Script to delete all the files from some poor grandmother's PC, I mean them and everyone up to people with multiple Ph.D.s and honorary doctorates.

So if you find it difficult to convince yourself that comments are a good idea you are in good company.

Unfortunately this isn't a case of you and I being as good as them but of them being as bad as us. Good comments can be critical to the longevity of a program; if a maintenance programmer can't understand how your code was supposed to work he might have to rewrite it. If he does that he will also have to write more comments and more tests. He will almost certainly introduce more bugs and it will be your fault because you didn't have the courtesy to explain why your program was written the way it was. Maintenance programmers are often not part of the original team so they don't have any shared background to help them understand, they just have a bug report, the code and a deadline. If you don't write the comments you can be sure that no one will add them later.

Comments need to be written with the same care and attention as the code itself. Sloppily written comments that simply repeat what the code says are a waste of time, better to say nothing at all. The same goes for comments that contradict the code; what is the reader meant to believe, the code or the comment. If the comment contradicts the code someone might 'fix' the code only to find that it was actually the comment that was broken.

Here are some example comments:

That comes from one of my own programs that I created by hacking at a template provided by someone else. Why he added the comment is beyond me, it adds nothing to the code which contains the word CommandBar twice anyway!

Here is another from a similar program:

Same degree of pointlessness. Both examples are from programs that I use everyday.

Another from the same program which shows both a good comment and a pointless one:

The first comment line simply repeats the information contained in the name, the last comment line tells us only what we can easily glean from the declaration. The middle two lines say something useful, they explain the otherwise puzzling use of the Variant data type. My recommendation is to delete the first and last comment lines, not because they are incorrect but because they are pointless and make it harder to see the comment that actually matters.

To summarize: good comments explain why not what. They tell the reader what the code cannot.

You can see what is happening by reading the code but it is very often hard or impossible to see why the code is written as it is.

Comments that paraphrase the code do have a place. If the algorithm is complex or sophisticated you might need to precis it in plain human language. For example the routines that implement an equation solver need to be accompanied by a description of the mathematical method employed, perhaps with references to textbooks. Each individual line of code might be perfectly clear but the overall plan might still be obscure; a precis in simple human language can make it plain.

If you make use of a feature of VB that you know is little used you might need to point out why it works to prevent well-meaning maintenance programmers cleaning it up.

If your code is solving a complex problem or is heavily optimized for speed it will need more and better comments than otherwise but even simple code needs comments that explain why it exists and outlines what it does. Very often it is better to put a narrative at the head of a file instead of comments on individual code lines. The reader can then read the summary instead of the code.

Summary

 * Comments should add clarity and meaning,
 * Keep the comments short unless the complexity of the code warrants a narrative description,
 * Add comments to the head of each file to explain why it exists and how to use it,
 * Comment each function, subroutine and property to explain any oddities such as the use of the Object or Variant data types.
 * If a function has side effects explain what they are,
 * If the routine is only applicable to a certain range of inputs then say so and indicate what happens if the caller supplies something unexpected.

Exercises

 * Take a piece of code written by someone else and try to understand how it works without reading the comments.
 * Try to find some code that doesn't need any comments. Explain why it doesn't need them. Does such a thing exist?
 * Search the web, your own code or a colleague's code to find examples of good comments.
 * Put yourself in the position of a maintenance programmer called in to fix a bug in a difficult piece of your own code. Add comments or rewrite the existing comments to make the job easier.

Avoid Defensive Programming, Fail Fast Instead
By defensive programming I mean the habit of writing code that attempts to compensate for some failure in the data, of writing code that assumes that callers might provide data that doesn't conform to the contract between caller and subroutine and that the subroutine must somehow cope with it.

It is common to see properties written like this:

The reason it is written like this is probably that the programmer of the class in which it sits is afraid that the author of the code that uses it will forget to initialize the object properly. So he or she provides a default value.

The problem is that the programmer of the MaxSlots property has no way of knowing how many slots that the client code will need. If the client code doesn't set MaxSlots it will probably either fail or, worse, misbehave. It is much better to write the code like this:

Now when client code calls MaxSlots Get before calling MaxSlots Let an error will be raised. It now becomes the responsibility of the client code to fix the problem or pass on the error. In any case the failure will be noticed sooner than if we provide a default value.

Another way of viewing the distinction between Defensive Programming and Fail Fast is to see Fail Fast as strict implementation of a contract and Defensive Programming as forgiving. It can be, and is often, debated whether being forgiving is good always a good idea in human society but within a computer program it simply means that you don't trust all parts of the program to abide by the specification of the program. In this case you need to fix the specification not forgive transgressions of it.

Exercises

 * Take a working non-trivial program and search the code for defensive programming,
 * Rewrite the code to make it fail fast instead,
 * Run the program again and see if it fails,
 * Fix the client part of the program to eliminate the failure.

Assertions and Design By Contract
An assertion is a statement that asserts that something is true. In VB6 you add assertions like this:

If the statement is true then the program continues as if nothing happened but if it is not then the program will stop at that line. Unfortunately VB6 has a particularly weak form of assertions, they are only executed when the code is running in the debugger. This means that they have no effect at all in the compiled program. Don't let this stop you from using them, after all this is a feature that doesn't exist at all in many common programming languages.

If you really need the assertions to be tested in the compiled program you can do something like this:

Now the program will stop at the failure when in the IDE and raise an error when running compiled. If you plan to use this technique in more than a few places it would make sense to declare a subroutine to do it so as to reduce clutter:

then instead of writing Debug.Assert you write:

Assertions can be used to implement a form of design by contract. Add assertions at the beginning of each routine that assert something about the values of the arguments to the routine and about the values of any relevant module or global variables. For instance a routine that takes a single integer argument that must be greater than zero would have an assertion of the same form as the one shown above. If it is called with a zero argument the program will halt on the line with the assertion. You can also add assertions at the exit of the routine that specify the allowed values of the return value or any side effects.

Assertions differ from explicit validation in that they do not raise errors or allow for the program to take action if the assertion fails. This is not necessarily a weakness of the assertion concept, it is central to the different ways that assertions and validation checks are used.

Assertions are used to do several things:
 * They specify the contract that must be followed by the calling and called code,
 * They assist in debugging by halting execution at the earliest point at which it is known that something is wrong. Correctly written assertions catch errors long before they cause the program to blow up.

Assertions generally assist in finding logic errors during the development of a program, validation, on the other hand, is usually intended to trap poor input either from a human being or other external unreliable source. Programs are generally written so that input that fails validation does not cause the program to fail, instead the validation error is reported to a higher authority and corrective action taken. If an assertion fails it normally means that two internal parts of the program failed to agree on the terms of the contract that both were expected to comply with. If the calling routine sends a negative number where a positive one is expected and that number is not provided by the user no amount of validation will allow the program to recover so raising an error is pointless. In languages such as C failed assertions cause the program to halt and emit a stack trace but VB simply stops when running in the IDE. In VB assertions have no effect in compiled code.

Combined with comprehensive tests, assertions are a great aid to writing correct programs. Assertions can also take the place of certain types of comments. Comments that describe the allowed range of values that an argument is allowed to have are better written as assertions because they are then explicit statements in the program that are actually checked. In VB you must exercise the program in the IDE to get the benefit of the assertions, this is a minor inconvenience.

Assertions also help ensure that a program remains correct as new code is added and bugs are fixed. Imagine a subroutine that calculates the equivalent conductance of a radiator due to convection (don't worry if the physics is unfamilar):

Now it so happens, if you know the physics, that conductance is always a non-negative number regardless of the relationship between the temperatures T1 and T2. This function, however, makes the assumption that T1 is always greater than, or equal to, T2. This assumption might be perfectly reasonable for program in question but it is a restriction nonetheless so it should be made part of the contract between this routine and its callers:

It is also a good a idea to assert something about the results of a routine:

In this particular case it looks as if the assertion on the result is worthless because if the precondition is satisfied it is obvious by inspection that the postcondition must also be satisfied. In real life the code between the precondition and postcondition assertions is usually much more complicated and might include a number of calls to functions that are not under the control of the person who created the function. In such cases the postcondition should be specified even if it appears to be a waste of time because it both guards against the introduction of bugs and informs other programmers of the contract that the function is supposed to comply with.

Tests
Tests vary from writing a program and then running it and looking casually at its behaviour to writing a full suite of automated tests first and then writing the program to comply.

Most of us work somewhere in between, usually nearer the first alternative than the second. Tests are frequently regarded as an extra cost but like quality control systems for physical products the supposed cost of quality is often negative because of the increased quality of the product.

You can use tests to define the specification of a function or program by writing the tests first, this is one of the practices of the Extreme Programming method. Then write the program piece by piece until all of the tests pass. For most people this seems like a counsel of perfection and entirely impractical but a certain amount of it will pay off handsomely by helping to make sure that component parts work properly before integration.

A test is usually built as a separate program that uses some of the same source code as the deliverable program. Simply writing another program that can use component parts of the deliverable will often be enough to expose weaknesses in the design.

The smaller the component that is being tested the easier it will be to write a test, however if you test very small pieces you can waste a lot of time writing tests for things that can be easily checked by eye. Automated tests are probably best applied to those parts of the program that are small enough to be extracted from the real program without disruption yet large enough to have some complex behaviour. It's hard to be precise, better to do some testing than none and experience will show you where the effort is best expended in your particular program.

You can also make the tests part of the program itself. For instance, each class could have a test method that returns true the test passes and false otherwise. This has the virtue that every time you compile the real program you compile the tests as well so any changes in the program's interface that will cause the tests to fail are likely to be captured early. Because the tests are inside the program they can also test parts that are inaccessible to external test routines.

Hungarian Notation
Hungarian notation is the name for the prefixes that many programmers add to variable names to denote scope and type. The reason for doing it is to increase the readability of the code by obviating the need to keep referring to the variable declarations in order to determine the type or scope of a variable or function.

Experienced Basic programmers have been familiar with a form of this notation for a very long time because Microsoft Basic's have used suffixes to indicate type (# means Double, & means Long, etc.).

The fine details of the Hungarian Notation used in any given program don't matter very much. The point is to be consistent so that other programmers reading your code will be able to quickly learn the conventions and abide by them. For this reason it is wise to not overdo the notation, if there are too many different prefixes people will forget what the rarely used ones mean and that defeats the purpose. It is better to use a generic prefix that will be remembered than a host of obscure ones that won't.

A recommendation for Hungarian Notation is described in greater detail in the Coding Standards chapter.

Memory and Resource Leaks
You might think that because Visual Basic has no native memory allocation functions that memory leaks would never occur. Unfortunately this is not the case; there are several ways in which a Visual Basic program can leak memory and resources. For small utility programs memory leaks are not a serious problem in Visual Basic because the leak doesn't have a chance to get big enough to threaten other resource users before the program is shut down.

However, it is perfectly reasonable to create servers and daemons in Visual Basic and such programs run for a very long time so even a small leak can eventually bring the operating system to its knees.

In Visual Basic programs the most common cause of memory leaks is circular object references. This problem occurs when two objects have references to each other but no other references to either object exist.

Unfortunately the symptoms of a memory leak are hard to spot in a running program, you might only notice when the operating system starts complaining about a shortage of memory.

Here is an example problem that exhibits the problem:

Class1 is a simple class with no methods and a single Public attribute. Not good programming practice for real programs but sufficient for the illustration. The xProblem subroutine simply creates two instances of Class1 (objects) and links them together. Notice that the oObject1 and oObject2 variables are local to xProblem. This means that when the subroutine completes the to variables will be discarded. When Visual Basic does this it decrements a counter in each object and if this counter goes to zero it executes the Class_Terminate method (if there is one) and then recovers the memory occupied by the object. Unfortunately, in this case the reference counter can never go to zero because each object refers to the other so even though no variable in the program refers to either of the objects they will never be discarded. Any language that uses a simple reference counting scheme for cleaning up object memory will suffer from this problem. Traditional C and Pascal don't have the problem because they don't have garbage collectors at all. Lisp and its relatives generally use some variant of mark and sweep garbage collection which relieves the programmer of the problem at the expense of unpredictable changes in resource load.

To demonstrate that there really is a problem add Initialize and Terminate event handlers to Class1 that simply print a message to the Immediate Window.

If the xProblem routine were working without a leak you would see an equal number of Initialize and Terminate messages.

Exercises

 * Modify xProblem to ensure that both objects are disposed of when it exits (hint: setting a variable to Nothing reduces the reference count of the object it points to).

Avoiding and Dealing with Circular References
There are a number of techniques that can be used to avoid this problem beginning with the obvious one of simply never allowing circular references:


 * Forbid circular references in your programing style guide,
 * Explicitly clean up all references,
 * Provide the functionality by another idiom.

In real programs forbidding circular references is not usually practical because it means giving up the use of such useful data structures as doubly linked lists.

A classic use of circular references parent-child relationships. In such relationships the parent is the master object and owns the child or children. The parent and its children share some common information and because the information is common to all of them it is most natural that it be owned and managed by the parent. When the parent goes out of scope the parent and all the children are supposed to be disposed of. Unfortunately this won't happen in Visual Basic unless you help the process along because in order to have access to the shared information the children must have a reference to the parent. This is a circular reference.

--      --  | parent | --->  | child  | |       | <---  |        |  --       --

In this particular case you can usually avoid the child to parent reference completely by introducing a helper object. If you partition the parent's attributes into two sets: one which contains attributes that only the parent accesses and another that is used by both parent and children you can avoid the circularity by placing all those shared attributes in the helper object. Now both parent and have reference to the helper object and no child needs a reference to the parent.

--      --  | parent | > | child  | |       |       |        |  --       --      |                |      |                |      |   --   |       -> | common | <- --

Notice how all the arrows point away from the parent. This means that when our code releases the last reference to the parent that the reference count will go to zero and that the parent will be disposed of. This,in turn, releases the reference to the child. Now with both parent and child gone there are no references left to the common object so it will be disposed of as well. All the reference counting and disposal takes place automatically as part of Visual Basic's internal behaviour, no code needs to be written to make it happen, you just have to set up the structures correctly.

Note that the parent can have as many children as you like, held in a collection or array of object references for instance.

A common use for this sort of structure occurs when the child needs to combine some information about the parent with some of its own. For instance, if you are modelling some complicated machine and want each part to have a property showing its position. You would like to avoid making this a simple read write property because then you have to explicitly update that property on each object when the machine as a whole moves. Much better to make it a calculated property based on the parent position and some dimensional properties then when the parent is moved all the calculated properties will be correct without running any extra code. Another application is a property that returns a fully qualified path from root object to the child.

Here is a code example:

As it stands that really only works for one level of parent child relations, but often we have an indefinite number of levels, for instance in a disk directory structure.

We can generalise this by recognizing that parents and children can actually be the same class and that the child doesn't care how the parent path is determined so long as it comes from the common object.

Now we can ask any object at any level of the structure for its full path and it will return it without needing a reference to its parent.

Exercises

 * Create a simple program using the cfolder and cCommon classes and show that it works; that is, that it neither leaks memory nor gives the wrong answers for the Path property.

Errors and Exceptions
Before discussing the various kinds of errors we'll show how errors are handled in Visual Basic.

Visual Basic does not have exception classes, instead it has the older system of error codes. While this does make some kinds of programming awkward it really doesn't cause all that much trouble in well written programs. If your program doesn't rely on handling exceptions during normal operations you won't have much use for exception classes anyway.

However if you are part of a team that is creating a large program that consists of a large number of components (COM DLLs) it can be difficult to keep lists of error codes synchronized. One solution is to maintain a master list that is used by everyone in the project, another is to use exception classes after all. See VBCorLib for an implementation in pure VB6 of many of the classes in the mscorlib.dll that provides the basis for programs created for Microsoft's .NET architecture.

There are two statements in Visual Basic that implement the error handling system:


 * On Error Goto
 * Err.Raise

The usual way of dealing with errors is to place an On Error Goto statement at the top of a procedure as follows:

EH is a label at the end of the procedure. Following the label you place code that deals with the error. Here is a typical error handler:

There are several important things to notice about this:


 * There is an Exit Sub statement immediately preceding the error handler label to ensure that when no error occurs the program does not fall into the error handler.
 * The error code is compared to a constant and action taken depending on the result,
 * The last statement re-raises the error in case there was no explicit handler,
 * A resume statement is included to continue execution at the failed statement.

It might be that this is a perfectly useable handler for some procedure but it has some weak points:


 * Use of a literal constant,
 * The catch all Err.Raise statement doesn't provide any useful information, it just passes on the bare error code,
 * Resume re-executes the failed statement.

There is no reason to ever use literal constants in any program. Always declare them either as individual constants or as enumerations. Error codes are best declared as enumerations like this:

When you want a new error code you just add it to the list. The codes will be assigned in increasing order. In fact you really never need to care what the actual number is.

You can declare the built in error codes in the just the same way except that you must explicitly set the values:

Taxonomy of Errors
There are broadly three kinds of errors:


 * Expected:Expected errors occur when the user or other external entity provides data that is clearly invalid. In such cases the user, whether a human or another program, must be informed about the error in the data not about where in the program it was discovered.
 * Failure to abide by the contract:a caller fails to supply valid arguments to a subroutine or a subroutine fails to provide a valid return value.
 * Unexpected:an error occurs that was not predicted by the specification of the program.

Expected errors are not really errors in the program but errors or inconsistencies in the data presented to the program. The program can must request that the user fix the data and retry the operation. In such cases the user has no use for a stack trace or other internal arcana but has a considerable interest in clear and full description of the problem in external terms. The error report should point the user directly at the problem in the data and offer suggestions for fixing it. Don't just say "Invalid Data", say which piece of data is invalid, why it is invalid and what range of values can be accepted.

Contract failures usually indicate logic errors in the code. On the assumption that all the invalid data has been weeded out by the front end the program must work unless the program itself is faulty. See the for more information on this topic.

Unexpected errors are the errors that most programmers concentrate on. In fact they are not terribly common, they only appear so because the contract between the various parts of the program is rarely spelled out. Unexpected errors differ from Expected errors in that the user cannot usually be expected to assist in immediate recovery from unexpected errors. The program then needs to present the user with all the gory details of the internal state of the program at the time that the error was discovered in a form that can easily be transmitted back to the maintainers of the program. A log file is ideal for this purpose. Don't just present the user with a standard message box because there is no practical way to capture the description so that it can be emailed to the maintainers.

Expected Errors
These are errors in the input data. From the point of view of the program they are not exceptions. The program should explicitly check for valid input data and explicitly inform the user when the data is invalid. Such checks belong in user interfaces or other parts of the program that are able to interact directly with the user. If the user interface is unable to perform the checks itself then the lower level components must provide methods that validate the data so that the user interface can make use of them. The method used to notify the user should depend on the severity and immediacy of the error as well as the general method of working with the program. For instance a source code editor can flag syntax errors without disturbing the creative flow of the user by highlighting the offending statements, the user can fix them at leisure. In other cases a modal message box might be the correct means of notification.

Contract Errors
These failures are detected by asserting the truth of the preconditions and postconditions of a procedure either by using the assertion statement or by raising an error if the conditions are not met. Such errors normally indicate faulty program logic and when they occur the report must sufficient information to enable the problem to be replicated. The report can easily be quite explicit about the immediate cause of the failure. When using Visual Basic it is important to remember that Debug.Assert statements are only executed in the IDE so it can be worthwhile raising errors instead in all but the most time critical routines.

Here is a simple, too simple, example of contract assertions:

Visual basic will halt on the assertion of n = 0 so if you run your tests in the IDE you will be taken directly to the spot where the error is discovered.

Unexpected Errors
These are errors that are not caught by either validation of input data nor assertion of the truth of preconditions or postconditions. Like contract errors they indicate faulty logic in the program; of course the logic error can be in the validation of the input data or even in one of the preconditions or postconditions.

These errors need the most thorough report because they are obviously unusual and difficult or they would have been foreseen in the validation and contract checks. Like contract failures the report should be logged to a text file so that it can easily be sent to the maintainers. Unlike the contract failures it won't be easy to tell what information is relevant so err on the safe side by including descriptions of all the arguments to the subroutines in the Err.Raise statements.