Foundations of Computer Science/Algorithm Complexity

Algorithm Complexity
"An algorithm is an abstract recipe, prescribing a process that might be carried out by a human, by computer, or by other means. It thus represents a very general concept, with numerous applications."—David Harel, "Algorithmics - the spirit of computing".

We have learned that algorithms are conceptual solutions to problems. Computing algorithms can be described/defined in terms of the common units of work that computers can do so that they are neutral to programming languages and the execution environment (computers). Algorithm writing is such a creative process as Francis Sullivan noted that "for me, great algorithms are the poetry of computing. Just like verse, they can be terse, allusive, dense, and even mysterious. But once unlocked, they cast a brilliant new light on some aspects of computing." You have seen example algorithms documented using abstract notations such as pseudo code and flowchart.

Once algorithms are implemented/coded, i.e. represented by concrete symbols, they become alive in program that embody them. Programs are executable/runnable by computers. The ability to run different programs that implements all kinds algorithms is a unique feature of computers as machines. Usually when we buy a machine, e.g. an appliance, we assume it has a set of well defined functions it can perform. For example a microwave oven is supposed to warm and cook our food and nothing more. We don't expect a microwave oven ever to be able to wash the clothes for us. A computing machine (a computer) is different. We expect it to perform the function whichever program makes it to do - the functionality of a computer is extensible. If we liken a computer to a car, programmers are drivers who can make the car do different things (to a certain extent) and users are like passengers taking advantage of the things the car can do. This is another reason why everyone needs to learn computer programming because it gives you the freedom to make the computer do different thing.

Correctness of Algorithms
Algorithms must be correct to be useful. We must examine our algorithms carefully to remove errors before using them to create programs. If an algorithm has errors, it is impossible to create correct programs. We often remind students that if their algorithms do not work on paper they won't work on a computer. So we must work out our algorithms on paper first.

Even if the design of an algorithm is correct we may introduce errors during the programming process. As any natural language a programming language has its own syntax consisting of grammatical rules for using the language. When we violate such rules we introduce syntax errors in our program. This type of errors are easy to fix, in fact most modern program editors can detect and warn us about such errors. Another type of errors are logic errors, which result from the misuse of the language. In other words, our grammatically correct program doesn't make sense or make the wrong sense to the computer. For example, a recipe can be unclear or misleading even though all sentences are grammatically correct. You may think that computers can also make mistakes when running programs that are logically. This is true but very rarely this is the case especially with modern computers equipped with error detection mechanisms. We generally assume computers don't make mistakes. If the program doesn't generate the correct answer, it is the result of human errors.

We also call logic errors software bugs. The original "bug" is, in fact, a hardware failure - a moth caught in a electromechanical computer. Now bugs are generally used to refer to any error/failure in computer systems, both in hardware and in software. When a computer program is buggy, it will produce erroneous results or crash as the computer may not know what to do next. Another more subtle bug may cause the computer program to never finish, known as the infinite loop, which is obviously not what we wanted.

Bugs are almost inevitable as humans make mistakes. Then, how do we fix bugs to ensure the correctness of our programs? We must test our programs to verify its correctness. A test consists of a sample input to a program and the desired output from the program. We can run a test by subject the program to the sample input and collect the actual output. If the actual output matches the desired output the program passes the test, otherwise their is a bug in the program or in the test (tests can be buggy too). We usually use a set of tests (a test suite) to exercise different parts of the algorithm. This process is called debugging as expressed in the following pseudo code:

for each test in the test suite run and compare the actual output to the desired output if they match move on to the next test otherwise fix the bug and repeat the whole process

Note that it is very difficult to make our tests exhaustive except for very simple programs. When a program becomes larger and more complex the number of tests need to cover all possible cases grows very fast. As Dijkstra said "program testing can be used to show the presence of bugs, but never to show their absence!" There are techniques for proving the correctness of programs. For instance, microcode for computer processors are often proved correct via formal verification.

"Goodness" of algorithms
There are usually always multiple ways to solve the same problem and, therefore, multiple algorithms to make computers solve the problem for us. In addition to solving the problem correctly we want to be able to compare/evaluate algorithms that solve the same problem. We must define the criteria/measure for "goodness" in algorithm design. Such criteria or measures can include simplicity, ease of implementation, speed of execution, or preciseness of answers. In computing we care the most about the solution speed because a fast algorithm can solve larger problem or more problems in the same amount of time. This is also known as efficiency - an economic measure of many processes. Often time he usefulness of many programs depends on the timeliness of the results. For instance, a program that takes 24 hours to predict the next day's weather is almost useless.

Given an algorithm how do we measure its "speed"? One possible approach is to implement the algorithm, run the result program, and measure its execution time - elapsed time between the start and the end of a program run. There are some challenges in this approach. First the algorithm must be implemented, which can a serious undertaking. Secondly, to run two programs to compare their execution time we must subject them to the same input (a.k.a a workload, e.g a list of one million numbers to be sorted) and the size of the input is never ideal. Thirdly, the "speed" of a running program is influenced by the execution environment, such as the machine's hardware configuration. Take a recipe for example. Different cooks will surely spend different time following it. The amount of food needed will surely magnify the difference - as we increase the amount of ingredients the lengthy recipes will take even longer to follow. But there are intrinsic characteristics of the recipes that affect the preparation time. If a recipe involves beating eggs can instruct the cook to break each egg and scramble it individually or break all eggs first and scramble them together. Obviously the first method is slower due to the additional steps involved. Algorithms in computing exhibit similar characteristics. Recall that algorithms must be defined in terms of the units of work (steps) computers can perform so that it is straightforward to implement them in programming languages. The way the steps are ordered and organized in algorithms can significantly affect the execution time of the programs implement the algorithms.

Since an algorithm is a conceptual solution to a problem, we want to study their "speed" in an abstract sense without actually implementing them. This is known as algorithm analysis in computer science. In this approach we take an algorithm described in pseudo code or flow chart, count the number of steps (units of work), which is always a function of the input size. In the aforementioned example recipe the time it takes to follow the first method (break each egg and scramble it individually) is directly proportional to the number of eggs involved. In fact if only one egg is needed, there is no difference between the two methods. Instead of measuring the steps taken for a particular input size, we focus on the relationship function between the number of steps and the input size, which shows the pattern in which the amount of work (cost) grows as the input size increases. Such functions are also known as growth functions. Then, we apply asymptotic analysis, which compare functions as inputs approach infinity, to simplify the functions because as the input size approaches infinity the difference between the units of works disappears (we can assume breaking an egg and scrambling it take the same amount of time) and the cost of most "complex" part of the task will dominate (a part that repeats a step 10 time will dwarf any step that is only done once) the total cost. For example, in a recipe we have a one quick step (takes 0.0001 seconds per serving) that repeats 10 times and a slow step (takes 10 seconds per serving) doesn't repeat, an amount of serving of N (input size) would cost $$0.0001*10*N$$ total seconds on the repeated steps and would always cost 10 seconds on the slow step. When N is bigger than 10000, the repeated part would cost more than the slow part. In asymptotic analysis we can ignore the slow step because its contribution to the total cost is negligible when N approaches infinity. With simplify growth functions we can put them into categories considering algorithms in each category to have similar performance characteristics. This type of analysis is not completely precise but it can be very useful in studying the nature of algorithms and predicting their performances. We learn how to put algorithms into categories and rank their performances according to the categories they are in. We denote each category using big O notation.

In summary we have discussed the following few key concepts regarding algorithm complexity analysis:
 * Algorithms are conceptual and abstract solutions to problems and programs are concrete executable code computers can run. We can not run algorithms on computers; we can only run programs that implement algorithms. Therefore, the performance of an algorithm is by definition abstract (can not be concretely defined or measured).
 * The goodness measure of an algorithm has to be an intrinsic characteristic of the algorithm itself - something that reflects the "cleverness" of the design regardless the implementation details and the future execution environments.
 * From an economic perspective we care about the cost of algorithms both in terms of time and space. We will focus only on the time cost in this course. We can not actually measure the time cost of algorithms, but we can study the relationship between the time cost and the input size, which reflects the internal complexity (or cleverness) of algorithms. We represent such relationships as growth functions with the input size as the variable and the total cost as the value. The growth functions can be simplified by keeping only the dominate term (asymptotic analysis) because other terms won't matter when the input becomes really large (eventually approaching infinity). Simplified growth functions are put into categories and algorithms can be rank by the categories they belong to.
 * Algorithms in the low complexity category will perform better than algorithms in the higher complexity categories when the input size is sufficiently large. We care about large input sizes because any algorithm can solve a small problem fast. With this algorithm analysis technique we can evaluate and compare algorithms to predict the relative performance of the programs implementing such algorithms before actually implementing any of them.
 * Efficiency, as another term often used, is reversely proportional to complexity. A more complex algorithm is less efficient because it makes less efficient uses of the computing resources by being more complex.

Examples
There are at least two ways to calculate the sum of all numbers between 1 and N. The following are two algorithms:
 * Algorithm 1: add all the numbers up manually, one by one
 * Algorithm 2: calculate the result using this formula $$(N+1)*(N/2)$$

Consider the following question: if we carry out the two algorithms manually, which one would run faster if
 * N = 2?
 * N = 100?
 * N = 1,000,000?

Lets see how a algorithms behave (after implemented) on a computer. The following script implements algorithm 1 using a block that iterates through all the numbers in the range and adding them to the sum one at a time. The same problem can be solved using algorithm 2, which use a function to calculate the result as shown in the following script and the report block.

Both scripts (programs) take the same input - two numbers that define a range. The number of numbers in the range is the input size or more properly the problem size. We assume the units of work are arithmetic operations $$+-*/$$, assignment operations, and report operation. Such an assumption is reasonable because in deed those operations takes the same to to perform regardless the operands. If you run the programs and try different input size you will observe that as you increase the input size the execution time of the first program increases steadily whereas the second program shows not change in execution time. Can we predict such behaviors?

Lets apply algorithm analysis to these two algorithms. The first algorithm loops through the numbers to get the sum, therefore the relationship function between cost (the number of steps - additions) and the input size is $$f=a+b*N$$ assuming N is the input size and a and b are the cost for creating the script variable and the cost for an addition. Note that both a and b are constant because they don't change for a given computer. We can simply the function to $$f=N$$ because when N approaches infinity the constants don't matter anymore. We assign algorithms with this type of growth function into the leaner algorithm to the leaner this type of functions linear time category denoted by $$O(N)$$. The second algorithm always takes the same amount of time because it performs the same set of operations regardless of the input size. It belongs to the constant time category denoted by $$O(1)$$.

Lets consider another problem: are numbers in a list distinct? Here is a straight forward algorithm in pseudo code:
 * Step 1: compare the first number with all of the other numbers in the list. If, at any point, the same number is seen, stop and answer NO.
 * Step 2: repeat Step 1 by taking the next number from the list and comparing it with all of the other numbers.
 * Step 3: after using all numbers in the list, stop and answer YES.

The input size is the size of the list. The more numbers in the list the longer it takes to answer the question. According to algorithm it is possible that the first comparison find two identical numbers, which answer the questions right a way. This is good design because at that point it is unnecessary to continue with the rest of the algorithm. When we analyze algorithms we focus on worst cases because the performance of an algorithm is dependent from the actual input and we should not rely on luck! So the relationship (growth) function between the cost and the input size for this algorithm should be $$f=N*(N-1)$$ because in the worst case (all numbers are unique) the algorithm has to compare each number to all the other numbers in the list before it arrive at the answer. To simply the function by keeping only the largest dominating term we get $$f=N^2$$, which puts the algorithm in the quadratic category $$O(N^2)$$. If you are interested you can reproduce and run the following script (an implementation of the algorithm) to verify the predicted performance characteristics of the algorithm.

Lets look at an algorithm from another category: print all n-digit numbers?

For each number in 0, 1, 2, 3, ..., 9 use the number for the first digit enumerate all n-1 digit numbers append each every n-1 digit number to the first digit to form a n digit number

This example is a little contrived but it demonstrate a new performance category and is very useful when we discuss encryption techniques. The algorithm is complex simply because the amount of output (all n digit numbers) it has to generate. So the cost of the output part will dominate the total cost. To generate all n digit numbers we must first generate all n-1 digit numbers. To generate all n-1 digit numbers we must first generate all n-2 digit numbers and so on. It is easier to study the process backward. Assuming outputting a number is the unit of work (it takes the same time output a number) the cost to generate all one digit numbers is 10. To generate all two digit numbers we have to output $$10*10=10^2$$ numbers. For three digit numbers the cost is $$10^3$$. Did you see a pattern? To generate all n digit numbers it costs $$10^n$$. This type of algorithms goes much faster than the quadratic ones because the input size is in the exponent. They belong to the exponential time category denoted as $$O(2^N)$$. The root (in this case 2) doesn't matter. Because all exponential functions are more similar to each other than quadratic functions or linear functions.

Here is a YouTube video illustrating the performance difference between the bubble sort algorithm and the quick sort algorithm: https://www.youtube.com/watch?v=aXXWXz5rF64