January 16, 2018 · Computer Science Data Structures

Binary Search Trees as a game of "Guess the Number"

FRIEND: I'm thinking of a number between 1 and 10. Guess what number it is!

YOU: 5.

FRIEND: Too high!

YOU: 3.

FRIEND: Too low this time.

YOU: It's 4!

FRIEND: Yup, you got it!

Does the above script sound familiar? I'm sure it does. If you've played any version of the game "Guess the Number" before, you've already implemented the algorithm behind binary search trees! That may be news to you, but that's only because you were doing it unconsciously. So let's put that algorithm into explicit computational terms.

For instance, when your friend said she was thinking of a number between 1 and 10, you intuitively picked 5 since that's the midpoint. Your friend then told you that 5 is too high. So what did you do? Was your next guess 7? Absolutely not. Since 5 is too high, you instantly disregarded numbers 5-10. So you chose the next midpoint between 1-5, which is 3. Now your friend tells you that 3 is too low. Now you mentally threw away numbers 1-3 as promising guesses. Finally, you chose the midpoint between 3-5, which is 4. And 4 turns out to be the number your friend was thinking of, since you eliminated all other options.

Notice that with each guess, you were able to throw away at least half of the potential solution space. Starting out, you knew that 1-10 are all the numbers that could potentially be the number your friend is thinking of. After guessing 5, that range whittled down to 1-4. And after guessing 3, you effectively eliminated all other options but 4.

Binary Search Tree Defined

The name of the algorithm you intuitively used is binary search. And binary search trees are not an algorithm, but a data structure. As you may have guessed though, these two things are intrinsically related. In fact, the binary search tree data structure was created solely to enable binary searching through different nodes.

How exactly is this supposed to work? Before we get to that, let's first define what a binary search tree is. Each node in a binary search tree has a left property and a right property. The left property points to a node that has a lesser value than the current node's value. The right property points to a node that has a greater value than the current node's value.

binary_tree

In the diagram above, 5's left property points to a node with a value of 3. And 5's right property points to a node with a value of 8. This fulfills the binary search tree requirement, since 3 < 5 < 8. And if you perform a similar check for each node in the tree, you'll find that this is indeed a valid binary search tree.

Searching a Binary Search Tree

Now let's get into how to search a binary search tree. After all, that's what the data structure was made for right?? Right, so let's pretend we're looking for the value 4. The question that we're asking is, "Does this binary search tree contain a node that has a value of 4?".

binary_tree

As human beings we can see right away that it does, but the computer doesn't know that. It has to start at the root, since that's the only node it has access to at first. So the computer starts at the root and sees a value of 5 there. 5 is not the value we're looking for, so what should the computer do next? There are only two options: (1) check the left node; or (2) check the right node. Well 4 < 5, so we go down the left path since that's where any values smaller than 5 must be.

binary_tree_search_first

Now we're at the 3 node. 3 is not the value we're looking for. Again, we only have two options: (1) check the left node; or (2) check the right node. 4 > 3, so we go down the right path since that's where any values greater than 3 must be.

binary_tree_search_last

Now we're at the 4 node. 4 is the value we're looking for, so at last we've found our value! Notice again that at each step we've eliminated about half of the potential solution space. For instance, in the beginning, we started out with a total of 7 nodes to check against. In the next step, we reduced that number to 3. And in the very last step, that number becomes 1. This means that searching in a binary search tree is a O(log N) operation.

Other Binary Search Tree Operations

What's awesome about a binary search tree is that pretty much all operations take O(log N) time! Why is this? Let's tackle insertion first.

Inserting an item

Imagine we want to insert 1 into our binary search tree.

binary_tree

The steps to insert are largely the same as the steps to search. The only difference is that now we're searching for the correct place to insert 1. So we traverse down each level of the tree until we finally hit a leaf node, or a node that has no children. In our case, the leaf node that we will hit is the node with a value of 2.

binary_tree_insertion

Finally, we simply set 1 as 2's left property, since 1 < 2. The actual insertion is a constant number of steps, so the overall time complexity is O(log N).

Deleting an item

We won't get into it here, but the steps to delete an item are also largely the same as the steps to search for an item. Granted, if we're deleting a non leaf node, there are a number of extra steps to perform to maintain the binary search tree structure. But these extra steps still take constant time. So the overall time complexity to delete is still O(log N).

An Important Caveat

O(log N) time is great right? Binary trees for the win! But wait... not so fast. Notice that the binary search tree we've been looking at is perfectly balanced, meaning that each left subtree and its corresponding right subtree both have similar heights.

binary_tree_balanced

Binary search tree operations are only O(log N) operations if the tree is balanced. In order to see why, let's take a look at a tree that is unbalanced.

binary_tree_unbalanced

If we inserted nodes in the following order: 2,3,4,5,6, you can imagine that we would end up with the binary tree above. In fact, the data structure doesn't even look like a tree anymore. Instead, it looks just like a linked list. And that's actually what it effectively becomes. For instance, to search for 6 in the unbalanced binary tree, we would now have to visit each and every node in order to find it. That's an O(N) operation, and it's exactly what we would have to do to search a linked list.

That's why you should never insert sorted data into a binary search tree. And if the data you're working with is sorted, make sure to shuffle it up before inserting into a binary search tree. There are definitely ways to guarantee that a binary search tree will always be balanced, but that's a topic that's outside the scope of this article. For now, just know that binary search trees provide O(log N) time complexity for all operations in most cases, but they don't guarantee it.

That's it!

I hope this article helps you understand binary search trees better. Linked lists and binary trees are really the fundamental node-based data structures. Once you get a hang of them, a whole world of possibilities will open up to you. And we'll get a glimpse at just a few of these possibilities in the upcoming posts. Stay tuned!