When is binary search worth it?

Disclaimer: don't consider this article even remotely accurate.
You can find my findings interesting, but I am far from a computer scientist, so take everything with a grain of salt.
Everything I have done here is purely for fun. I guess I hate my time.

Explanation

Everyone knows binary search is the fastest search algorithm for a sorted array. So, how can it not be worth it?

Binary search has a worst-case time complexity of O(log N). Linear search has a worst-case time complexity of O(N).

Looking at this only, you would say that binary search is always faster. However, the Big O notation only shows how an algorithm scales relative to its input. Each algorithm has any number of constant operations, which don't affect the Big O, but still take time.
For a bit more in-depth explanation, check out Why Big O doesn't tell the whole story.

Binary search also does just a bit more operations on every iteration than linear search:

Linear search does just one check on every iteration - checking if we have found our target element.
Binary search does the same check. However, if the current element is not our target, it has to check if the current element is smaller or larger than our target element. It also has to find the midpoint between our lower and upper bound. Those operations take negligible time, but never zero.

Imagine an airplane versus a car. The airplane is obviously faster, but you wouldn't use it to travel 25km. That's because there is some overhead to traveling by plane - most of it related to going to/from and waiting at the airport. In this case, traveling by car is obviously much faster. But after a certain distance, the benefits start to outweigh the overhead. According to MythBusters, that distance is around 600km: MythBusters Episode 221: Traffic Tricks.

Knowing this, we can now ask the question: how many elements do we need so that the benefits of binary search outweigh the overhead?

The answer is not much. But more than I expected.

Setup

Here is the code for my linear search function:

const linearSearch = (items, target) => {
  for (let i = 0; i < items.length; i++) {
    if (items[i] === target) {
      return i;
    }
  }

  return -1;
};

And here is the code for my binary search function:

const binarySearch = (items, target) => {
  let start = 0;
  let end = items.length - 1;

  while (start <= end) {
    let mid = Math.floor((start + end) / 2);

    if (items[mid] === target) {
      return mid;
    }

    if (target < items[mid]) {
      end = mid - 1;
    } else {
      start = mid + 1;
    }
  }

  return -1;
};

I tested the simplest case - arrays of numbers [0...n], with a large range of n's, something like that:

for (let i = 0; i < n; i++) {
  const items = generateItems(i); // generate array [0...i]
  linear(items, target);
  binary(items, target);
}

Environment

Language: JavaScript
Engine: NodeJS (V8)
OS: Windows 10 64-bit

Choosing the target element

To be as fair as possible, I wanted to test the worst-case for both algorithms.

For linear search, the worst case is of course the last element.
However, for binary search, it is not so obvious, but it is still the last element.

Timing

const { performance } = require('perf_hooks');

global.gc();

const startTimeLinear = performance.now();
linear(items, targetLinear);
const endTimeLinear = performance.now();

global.gc();

const startTimeBinary = performance.now();
binary(items, targetBinary);
const endTimeBinary = performance.now();

const timeLinear = endTimeLinear - startTimeLinear;
const timeBinary = endTimeBinary - startTimeBinary;

Memory management

Garbage collector

JavaScript has a garbage collector. To make sure that it doesn't decide to run at the most inconvenient time and affect the results, it would be good to manually run it right before we measure the execution time of either algorithm.

(Un?)fortunately, JavaScript doesn't expose any methods for manual memory management.
Fortunately, NodeJS has an option to expose the garbage collector: calling NodeJS with the --expose-gc flag makes it accessible from our code, like so:

global.gc();

linear(items, target);

global.gc();

binary(items, target);

Increasing memory

Despite our best efforts, we may run out of memory mid-test, and the engine will be forced to run the garbage collector. To ensure this doesn't happen, I wanted to increase Node's memory usage limits.

NodeJS has an option to increase its "old" memory section: --max-old-space-size=SIZE

I set it to an, I think, sufficiently large size - 5GB.

Averaging

Most of the time, when measuring time for performance, especially on such a small scale, results vary quite wildly. It largely depends on your computer's mood. To mitigate that, the test is run multiple times, and the results are averaged in the end. This way I could get a bit more meaningful results.

Interval

Instead of measuring every single amount of items, I chose to do it in an interval. It takes a lot of time to run the tests, so I needed to cut some significant time. Often, this much granularity is simply not needed.

Results

Inputs:

items: 0 through 600
interval: 25
total runs: 15000

x is the amount of items.
y is the time it took on average.

Took about 30 minutes.

Graph showing binary vs linear search for items 0 through 600. Intersection point is somewhere in the range 125-150.

Looks like the answer is somewhere in the range of 125 - 150 items. The interval is 25, so can't say for sure.

Before I started this, I expected the answer to be somewhere around 5-10 items. Definitely surprised.

To summarize: according to those results, if you have an array of fewer than 125 items, linear search may be faster than binary search.

Ultimate search

If you want to have the best of both worlds, here it is:

const linearSearch = (items, target) => {
  // ...
};
const binarySearch = (items, target) => {
  // ...
};

const ultimateSearch = (items, target) => {
  if (items.length > 100) {
    return binary(items, target);
  }

  return linear(items, target);
};

Please, don't actually do this.

Compiler optimizations

Even though JavaScript is not said to be a compiled language, compilation to machine code still happens. After all, your CPU doesn't speak JavaScript.

When a function is called many times, V8 manages to find a way to optimize it. Now, I won't pretend I understand everything that is going on under the hood, but I know for sure that something happens. When I ran node with the flag --print-opt-code, it showed that it does optimize the search functions somehow. To be honest, I didn't bother trying to understand exactly what optimizations were happening.

To some, optimizations may be cheating! Well, not to me. This is the result you would expect in a real world scenario (as if you'd do that in the real world).

Disabling optimizations

But to satisfy my curiosity, I ran the tests again, this time disabling optimizations for the two search functions.

Conveniently, node has a --allow-natives-syntax, which allows us to write native functions with our JavaScript code! Those native functions are mostly undocumented, besides some comments in the source code. Fortunately, a lot of them are already known and I didn't have to do any of that myself! If you want to learn more, you should Google something like "v8 native functions".

The one we need is %NeverOptimizeFunction(functionName);. As you can probably guess, that tells V8 not to optimize our function, ever.

This is what our code now looks like:

const linearSearch = (items, target) => {
  // ...
};

const binarySearch = (items, target) => {
  // ...
};

%NeverOptimizeFunction(linearSearch);
%NeverOptimizeFunction(binarySearch);

Yes, this is not valid JavaScript code, and VSCode complains about it. However, I don't care.

Results with no optimizations

This time, I ran the test for items 0 through 200 because I am not really interested in how it scales beyond the intersection point, but I increased the interval to get a bit more granularity.

Inputs:

items: 0 through 200
interval: 10
total runs: 15000

Took about 23 minutes.

Graph showing binary vs linear search for items 0 through 200. Intersection point is somewhere around 50.

So, without optimizations, the intersection point seems to be exactly on 50 items. In fact, the difference between the linear and binary time is 0.0000013ms.

To summarize: according to those results, if you have an array of fewer than 50 items, linear search may be faster than binary search if you have no optimizations.

Optimization improvement

Notice the difference between the optimized and unoptimized versions for 50 items:
Linear optimized: 0.0006374467094739278
Linear unoptimized: 0.002813186426957448
That's a 4.41 times difference!

Binary optimized: 0.0006878733237584432
Binary unoptimized: 0.0028145732084910073
That's a 4.09 times difference!

Also, if you look at the graphs again, you will notice that for the optimized version, there is an extra zero in the times!

Looks like those optimizations are doing something after all. Maybe they are a bit too underappreciated.

Trying with different targets

So far, all testing has been done with only the last element as a target for the search.
Maybe it would be interesting to see how everything changes depending on the target?

As we know, for linear search, the farther the target appears in the array, the longer it will take to find it. And the opposite is true - the closer the element is to the beginning, the faster it will find it. The first element is the best case.

The same is not true for binary search. The first element it checks is the most middle one. The farther the target is from a midpoint, the longer it will take.

Now, I can't run the tests with every single target possible. For 200 items, it means that I have to run the test 200² times (which is 40000). Imagine this for all numbers 0 through 200, times 15000 (in order to average it).

That's why I chose to test with 10 equally distributed targets for every amount of items, including the first, last, and middle element. Basically this:

// 0 through 10 instead of 0 through 1 with fractions
// because I didn't want to deal with floating point precision issues
for (let j = 0; j <= 10; j++) {
  for (let i = 0; i < AMOUNT_OF_ITEMS_PER_RUN; i += INTERVAL) {
    const target = Math.floor(i * (j / 10));
  }
}

I also ran the test for items 0 through 200 instead of the original 0 through 600 in order to save some work, and it still took quite a bit of time to run.

Results

Inputs:

items: 0 through 200
interval: 25
total runs: 15000
targets: 10 (equally distributed)

x is the target element's position in the array, relative to the array's length. 0 is start, 1 is end, 0.5 is middle, and so on.
y is the amount of items.
z is the time it took on average.

Took about 104 minutes.

Linear

Binary

The linear search is pretty much what you would expect. The farther the target element is, the longer it takes. Still interesting to see.

The binary search, however, is not that obvious. The middle target is the fastest, as expected. The others fluctuate due to the nature of the test - we test with 10 targets only, selected based on their relative position in the array. Unfortunately, when the rounding occurs, for some items the target may be quite favorable, while for others not so much. I just don't have the computational power to test with all possible targets without taking shortcuts.

If you want to run the test for yourself, the code and data are published on GitHub.

Code & data

All the code used to test this has been published on GitHub. All the data from the tests is on there as well!

If you want to run the tests yourself, you can do that! For more instructions, check out the README on there!

binary-vs-linear-search

Conclusion

This experiment is born out of having too much free time, not out of necessity. I wanted to satisfy my curiosity, and why not share my findings? But this is not meant as advice that you should bother optimizing your search functions.

In real life, you will never have to think about that. Binary Search is simply faster. Even in the case where linear search is faster, the difference is more than negligible. JavaScript, for what it is, is impressively fast, so even if you save any time, it is simply not worth the added complexity and effort, which is not much, I guess that says a lot.