User:Kayau/Computer Performance


 * Note: This page is being temporarily developed in my userspace. I will move it to the mainspace later. **

Clock speed

 * Each operation of the processor begins with the pulse of the system clock – this is clock cycle or clock tick.
 * The speed of the processor can be measured by the pulse frequency in cycles per second (Hz) $ndash; this is clock rate.
 * The time between two pulses is the cycle time.

Disadvantages
Clock speeds measures how fast a computer can process operations. However, each instruction takes a different number of operations, so clock speed is not a reliable measure of performance. Thus we have to look at instructions per second.

Instructions per second
To make a more accurate assessment of the processor's performance, we first need to figure out how many cycles an instruction takes.

Average cycles per instruction $$ CPI = \frac{\sum_{i=1}^n (CPI_i \times I_i)}{I_c}$$, where $$I_c$$ is the total number of instructions in a program or in a certain time interval, $$CPI_i$$ is the number of cycles per instruction and $$I_i$$ is the instruction count for a certain instruction $$i$$.

Using the CPI, we can figure out the approximate execution time"

The execution time $$T = \tau \times I_c \times CPI $$, where $$f$$ is the constant clock rate, constant cycle time $$\tau = \frac{1}{f}$$, $$I_c$$ is the total number of instructions in a program or in a certain time interval and $$CPI$$ is the average number of cycles per instruction.

However, moving data to and from the main memory usually takes more time than performing operations in the processor. In other words the processor cycle time may be longer than the memory cycle time. Let $$p$$ be the number of clock cycles for decoding and execution, $$m$$ be the number of memory references, and $$\tau_m$$ be the memory cycle time. We find that the new processor time $$T = \tau \times I_c \times p + \tau_m \times I_c \times m$$. Let $$k = \frac{\tau_m}{\tau}$$. Then the equation can be simplified:

Let $$\tau$$ be the processor cycle time, $$I_c$$ be the total number of instructions in a program or in a certain time interval, $$CPI$$ be the average number of cycles per instruction, $$f$$ is the constant clock rate such that constant cycle time $$\tau = \frac{1}{f}$$, $$p$$ be the number of clock cycles for decoding and execution, $$m$$ be the number of memory references, and $$k$$ be the ratio between the memory and processor cycle times.

Then the execution time $$T = \tau \times I_c \times [p + (m \times k)]$$

MIPS and MFLOPS
MIPS rate is one way of expressing the instruction execution rate, in terms of millions of instructions per second, i.e. $$\text{MIPS rate} = \frac{I_c / 10^6}{T} = \frac{I_c}{T \times 10^6}$$. Since $$T = \tau \times I_c \times CPI $$, $$\frac{I_c}{T} = \frac{1}{CPI \times \tau} = \frac{f}{CPI}$$. Substituting this into our original MIPS rate formula gives the following:

$$MIPS = \frac{I_c}{T \times 10^6} = \frac{f}{CPI \times 10^6}$$

An alternative is MFLOPS, which measures only floating-point instructions:

$$MFLOPS = \frac{\text{Total number of floating-point instructions}}{T \times 10^6}$$

Disadvantages
Instruction execution rate measures how fast a computer can handle instructions. However, the number of assembly instructions compiled from the same high-language instruction may be different in different systems. Two computers may have the same MIPS, but one may be significantly faster because a high-language instruction compiles to three assembly instructions on one and eight in the other. Thus we need to look at benchmarks.

Picking a mean
A benchmark program is a program written in a high-level programming language used to test the performance of a system. A collection of benchmark programs, or a benchmark suite, can be run on a computer to test its speed. These programs are usually written with a specific purpose in mind. For example, the SPECweb99 suite was written to assess the performance of web servers.

After obtaining the instruction execution rate of each program, the results should be averaged using a formula. There are three ways to obtain this average:

Arithmetic mean: $$ AM(x_1, \ldots, x_n) = \frac{1}{n}(\sum_{i=1}^n x_i) $$

Geometric mean: $$ GM(x_1, \ldots, x_n) = \sqrt[n]{\prod_{i=1}^n x_i} $$

Harmonic mean: $$ HM(x_1, \ldots, x_n) = \frac{n}{\sum_{i=1}^n \frac{1}{x_i}} $$

The result of an arithmetic mean is the average execution rate. However, this is not helpful for the users, who are more concerned about execution time. Harmonic means are more useful for this purpose.

SPEC benchmarks
SPEC benchmark suites measure performance in a different way. They measure two different metrics. The speed metric measures the computer's ability to perform a single task, whereas the rate metric measures its ability to perform multiple tasks.

To obtain the speed metric, we need to obtain the execution time of the benchmark programs on the system under test (SUT). Then we need to find the ratio of the reference execution time to the actual execution time. Finally, we evaluate the geometric mean of these figures. The larger the final result, the higher the speed compared to the reference time. Let $$R$$ be the reference execution time and the $$A$$ be the execution time of the system under test. Then:

For a finite number $$n$$ of benchmark programs, speed metric = $$\sqrt[n]{\prod_{i=1}^n \frac{R_i}{A_i}}$$

To obtain the rate metric, we need to run multiple copies of the benchmark program at the same time, usually the number of CPUs on the computer (but not necessarily so). The method for finding the rate metric is similar to that of the speed metric, except we need to multiply the reference execution time by the number of copies run simultaneously, say $$C$$:

For a finite number $$n$$ of benchmark programs, speed metric = $$\sqrt[n]{C \times \prod_{i=1}^n \frac{R_i}{A_i}}$$

Amdahl's Law
Some code must be executed serially, i.e. it can only be executed by one processor at a time. The speed gain from using several processors instead of one is given by Amdahl's Law:

Given:
 * $$n \in \mathbb{N}$$, the number of threads of execution,
 * $$B\in [0, 1]$$, the fraction of the algorithm that is strictly serial,

The time $$T \left(n \right)$$ an algorithm takes to finish when being executed on $$n$$ thread(s) of execution corresponds to:

$$T(n) = T(1) \left(B + \frac{1}{n}\left(1 - B\right)\right)$$

Therefore, the theoretical speedup $$S(n)$$ that can be had by executing a given algorithm on a system capable of executing $$n$$ threads of execution is:

$$S(n) = \frac{ T\left(1\right)}{T\left(n\right)} = \frac{T\left(1\right)}{T\left(1\right)\left(B + \frac{1}{n}\left(1 - B\right)\right) } = \frac{1}{B + \frac{1}{n}\left(1-B\right)}$$

A consequence of this is that, holding $$B$$ constant above 0, there is cap on the speed gain by adding multiple cores, namely $$\lim_{n \to \infty} \frac{1}{B + \frac{1}{n}\left(1-B\right)} = \frac{1}{B + \frac{1}{\infty}\left(1-B\right)} = \frac{1}{B}$$

We can even generalise Amdahl's law to speedup features in general if we let $$B$$ be the fraction of the code affected by that feature.