When Is Linear Search Faster Than Binary Search?

There are two primary ways to search a sorted array for a given key: binary search and linear search. You probably know that binary search is an O(logn) algorithm and linear search is an O(n) algorithm, and therefore are aware that for large enough values of n, binary search will be always faster. However, for surprisingly large values of n, linear search can be faster than binary search. The goal of this post is to answer when and why linear search can be faster than binary search.

The full code for this post can be found here.

The Code

In this post we analyze the following implementations of binary search and linear search (compiled with O3 and march=native):


size_t binary_search(const std::vector<long>& vec, long target) {
    size_t low = 0, high = vec.size();

    while (low < high) {
        size_t mid = low + (high - low)/2;
        if (vec[mid] < target) {
            low = mid + 1;
        } else {
            high = mid;
        }
    }
    return low;
}

size_t linear_search(const std::vector<long>& vec, long target) {
    size_t idx = 0;
    for (; idx < vec.size(); ++idx) {
        if (vec[idx] >= target) {
            break;
        }
    }
    return idx;
}

Measurements

Using Google Benchmark, I measured the performance of linear search and binary search on different array sizes. The following graph plots cycle counts on an AMD Ryzen 9 9900X (a Zen 5 CPU):

Binary Search
Linear Search
0100200300400500600700800050100150200250300350400450500550Number of ElementsCycles
Cycle counts on different array sizes on an AMD Zen 5 CPU.

As expected, the runtime of binary search grows logarithmically and the runtime of linear search grows linearly. However, for small arrays (fewer than 136 elements), linear search is actually faster than binary search! This is despite the fact that binary search always executes fewer CPU instructions, as the following graph demonstrates.

Binary Search
Linear Search
010020030040050060070080002004006008001000120014001600180020002200Number of ElementsInstructions
Instruction counts on different array sizes on an AMD Zen 5 CPU.

So what is going on here? If linear search is faster than binary search for small n, but always executes more instructions, then linear search must execute more Instructions Per Cycle (IPC), which the following chart plots:

Binary Search
Linear Search
010020030040050060070080000.511.522.533.54Number of ElementsIPC
Instructions per cycle on different array sizes on an AMD Zen 5 CPU.

So the question is, why does linear search achieve so much higher IPC than binary search? The answer is branch prediction. In order to give themselves work to do (rather than stalling the pipeline on every branch), modern CPUs execute instructions speculatively. In particular, they attempt to predict whether or not a particular branch will be taken (on x86 CPUs, branch instructions are the various jump instructions). In a linear search, the branch is always false until the very last iteration, when it becomes true. Therefore, the CPU can easily predict the branch in linear search. In binary search, assuming search keys are randomly selected, the branch is always taken with probability 1/2 and the CPU therefore cannot predict at all whether the branch will be taken.

We can measure this behavior very precisely. Using PMU events we can count branch mispredictions. In linear search there is always exactly 1 misprediction. In binary search, there are about 12log2n\frac{1}{2}\log_2n mispredictions. That's because binary search takes log2(n)\log_2(n) iterations to find the search key, and each branch is mispredicted with probability 1/2. The chart below illustrates this.

Binary Search
Linear Search
010020030040050060070080000.511.522.533.544.5Number of ElementsBranch Prediction Misses
Branch prediction misses on different array sizes on an AMD Zen 5 CPU.

Is This CPU specific?

A natural question now is, does this hold for other CPU models? The following chart plots cycles vs array size for a Zen 2 CPU. For this CPU we observe a very similar behavior.

Binary Search
Linear Search
0100200300400500600700800050100150200250300350400450500Number of ElementsCycles
Cycle counts on different array sizes on an AMD Zen 2 CPU.

But how about on a MacBook Pro? On my M2 MacBook Pro, binary search was always faster than linear search! The following chart illustrates this (here we plot real-time instead of cycles because the perf events API is Linux-specific).

Binary Search
Linear Search
0100200300400500600700800020406080100120140Number of ElementsReal Time (ns)
Real time different array sizes on a MacBook Pro.

Likewise, on my Raspberry Pi 4, binary search is always faster. So the takeaway, is that you need to measure it on whatever platform you care about!

Binary Search
Linear Search
01002003004005006007008000200400600800100012001400Number of ElementsCycles
Cycle counts on different array sizes on a Raspberry Pi 4.

Is this Workload Dependent?

In our previous measurements, I measured the runtime of binary search for randomly selected search keys. However, it's very possible that in a real life workload the search keys would not be randomly selected, and instead we would be repeatedly searching the same array for the values in the same range.

In this case, it's very possible that branch predictor could do a much better job predicting the branch in the binary search.

The plot below compares cycles for random binary search, predictable binary search, and linear search. The predictable binary search case is taken to the extreme, where we repeatedly search for the same value (also note that for array sizes so small, the entire array fits in the L1 cache, so in each case there are no cache misses).

Random Binary Search
Predictable Binary Search
Linear Search
0100200300400500600700800050100150200250300350400450500550Number of ElementsCycles
Cycles vs. input size on an AMD Zen 5 CPU.

The predictable case is by far the fastest! That is because now, the CPU branch predictor is able to predict whether or not the branch will be taken.

So the picture we have is complicated. Whether or not linear search is faster than binary search on small arrays depends on your CPU and also on your access pattern. The takeaway is that you need to measure the workload you care about on the CPU you care about!