A-level Computing/AQA/Paper 1/Fundamentals of algorithms/Searching algorithms

A searching algorithm looks for a given item in a given data structure. The algorithm used depends on how the data is structured.

Linear Search
If you have a list (or array) that is not sorted, then the simplest searching algorithm is linear search: go through the list item by item and compare to the searched item. If a comparison succeeds, the algorithm has found the item. If all comparisons fail, the item doesn't exist in the array or list.

In the simplest variant, the algorithm returns a boolean to indicate success or failure. Here is the pseudo-code:

which can be directly translated to Python:

A second variant returns the position of the item in the list, if it exists. If it doesn't, the algorithm returns an impossible position, like -1. Here's the pseudo-code:

Here is the Python code:

The following complete VB program asks the user for a letter and searches it in an array.

Try the code above searching for letter "w" and then for letter "z":

What are you searching for: w Found item w at position 4

What are you searching for: z -1

Consider the list of strings "Cat","Mouse","Frog","Lion","Panda","Llama","Bee", in this order. How many comparisons would it take to find "Panda"?

5

And how many when searching for "Camel"?

It would take 7 to reach the conclusion the string is not in the list.

Make a trace table for the code above, where  = "d".

For a list with $$n$$ items, what is the maximum number of comparisons it would take to see if an item is there or not?

It would take $$n$$ comparisons if the item is in the last position or not at all in the list.

Binary Search
As the last question points out, a linear search may take as many comparisons as there are items in the list, so searching for a name among a list of several million names (e.g. the electoral register of a country) could take a very long time.

If your list was put in ascending order by a sorting algorithm then you can perform a binary search. This involves splitting the data into half at each comparison, thereby 'zooming in' more quickly into the part of the list where the item must be, if it exists in the list. This results in much better performance, as the side box shows (the Big-O notation is explained in Unit 4).


 * Let's search for the name Miles in the following sorted list:
 * Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne


 * We compare Miles to the middle name, Mohammed:
 * Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne


 * Miles comes alphabetically before Mohammed, so we know that Miles won't be to the right of Mohammed. We can thus 'throw away' the right half of the list:
 * Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne


 * We now compare Miles to the middle name in the remaining list, Bernie:
 * Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne


 * Miles comes alphabetically after Bernie, so we can throw the left hand side away:
 * Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne


 * Finally we compare Miles to the middle name of this single item list, Claire:
 * Ali, Bernie,  Claire , Mohammed, Peter, Simon, Yvonne


 * Miles isn't the same as Claire, there are no more items to compare so we know that Miles isn't in the list.

This only took 3 comparisons using binary search, it would have taken 7 using linear search. It gets even better when the list is large. For example, a 1,000,000,000 item list would only take a maximum of 30 comparisons using binary search! It's very useful to have sorted data.

Sorted data is also useful for linear search. How could a modified linear search algorithm make fewer than 7 comparisons when searching Miles?

The modified linear search would know after 4 comparisons (against Ali, Bernie, Claire and Mohammed) that Miles is not in the sorted list, because Miles should have appeared before Mohammed.

For the list of 7 names shown above, can you think of a case where linear search is faster than binary search?

If we search for the first item in the list, Ali, binary search still takes 3 comparisons (against Mohammed, Bernie and Ali) but linear search only needs 1 comparison.

For linear search of a large list, the best case is if the sought item is in the first position. What is the best case for binary search of a large list?

Binary search only requires 1 comparison If the sought item is in the middle of the list.

After each unsuccessful comparison, binary search reduces the search space by half. The sublist that is being searched can be represented by two integers, with the start and end positions of the sublist. The Python code is: