January 25, 2018 · Computer Science Data Structures

Tries as Folders in a File System

I bet you've never really thought about the file system on your computer. Sure, you use it all the time. But have you ever considered what it would be like if your computer's data wasn't organized in a hierarchical folder structure? For instance, my photos live under this path: /Users/tiffany/Pictures. By visiting Users and then tiffany and then Pictures, I can navigate to my photos pretty intuitively. But what if there were no folders whatsoever on my computer? My photos would be stored alongside other users' photos, not to mention all the other files on my entire computer!

So let's not take for granted the sheer usefulness of the file system. Another awesome thing about the file system is that it serves as a great metaphor for the trie data structure! Working with a trie is in many ways just like working with the file system. The only difference is that the file system helps you find files easily, while a trie helps you find words easily.

alphabet folders

To make the analogy even more clear, imagine that I have a special computer where each folder's name can only be 1 character long. Let's say that my root path is still named /. When I go to /, I see 26 folders here, each corresponding to a single letter of the alphabet. And when I click on one of those folders, I see another set of 26 folders, each corresponding to a single letter of the alphabet. And this goes on until eventually I run out of folders to click on. So in order to get to the word "photos", I would navigate to this path: /p/h/o/t/o/s.

trie_as_folders

The above diagram is a visual representation of this full path. What's eerie about it is that it basically looks like a trie!

Trie Defined

Like a binary tree, a trie is a node-based data structure that is always accessed from its root node. Each node has a property, usually called children, that is essentially a set of pointers that point to other nodes. While exact implementation details may vary, a node's children is usually implemented as a hash where each key is a character. A node may also store additional metadata, such as an endOfWord property that indicates whether the currrent node represents the end of a word or not.

trie_nodes

So in order to get from node to node, we would simply access the appropriate key off of children. For instance, to get from the above node to c, we can do something like currentNode = children["c"].

Searching for a Word in a Trie

trie_search

Imagine we have the trie above. How would we determine whether the word "car" is in the trie? Well, first we always start from the root. At root, we check to see whether children["c"] points to anything. It does, so we traverse to the c node. At c, we check to see whether children["a"] points to anything. It does, so we traverse to the a node. At a, we check to see whether children["r"] points to anything. It does, so we traverse to the r node. And at r, we check the endOfWord property to see whether the node represents the end of a word. It does (represented by *), so we return true because "car" exists in the trie!

Let's try searching for a word that doesn't exist in the trie. For instance, let's search for "cat". Again, we always start from the root. At root, we check to see whether children["c"] points to anything. It does, so we traverse to the c node. At c, we check to see whether children["a"] points to anything. It does, so we traverse to the a node. At a, we check to see whether children["t"] points to anything. Uh oh! children["t"] doesn't point to anything. Now we know that "cat" doesn't exist in the trie, so we return false.

Inserting a Word

Since "cat" doesn't exist yet in our trie, let's go ahead and insert it. To insert, we still need to start from root. At root, we check to see whether children["c"] points to anything. It does, so we traverse to the c node. At c, we check to see whether children["a"] points to anything. It does, so we traverse to the a node. At a, we check to see whether children["t"] points to anything. It doesn't, so we create a brand new node to represent t. We then set this new node as a child of a, underneath a's children["t"] property.

trie_insert

And that's it! We've inserted "cat" into our trie. Notice that we didn't need to create new nodes for "c" or "a". In the diagram above, we can see that "cat" simply reuses the c and a nodes that were already there. This is a major space advantage of the trie, since many words will be able to share common ancestors.

Now let's go over an example of inserting a word that doesn't have any shared common ancestors. For instance, how can we insert "dog" into our trie? As always, we start from our root. At root, we check to see whether children["d"] points to anything. It doesn't, so we create a brand new node and set it as root's children["d"] property. Now we can traverse to the newly created d node. At d, we check to see whether children["o"] points to anything. It doesn't, so we create a brand new node and set it as d's children["o"] property. Now we can traverse to the newly created o node. At o, we check to see whether children["g"] points to anything. It doesn't, so we create a brand new node and set it as o's children["g"] property.

trie_insert_dog

And that's it! We've inserted "dog" into our trie.

Deleting a Word

While I won't go into every detail here, deleting a word is pretty straightforward. We simply delete every character in the word, unless the character is a shared common ancestor.

trie_deletion

For instance, to delete "cat" in our trie above, we delete the t node. However, we leave the c and a nodes, since those are still being used by "car".

Summary

The cool thing about the trie is that all operations are O(L) operations, where L is the number of characters in the word you want to search, insert, or delete. For instance, when we wanted to search for "car", we performed the same steps for "c", "a", and "r". And when we wanted to insert "cat", we performed the same steps for "c", "a", and "t". This is great, because inserting "cat" will always take 3 steps, no matter how huge the trie gets!

That's it!

Tries can be confusingly unfamiliar at first, but I hope this article helps make them more intuitive. As an exercise, I strongly encourage you to create a new folder called root somewhere on your system. From there, create the /p/h/o/t/o/s path that we talked about. Navigate to the word "photos". Now how would you add the word "photosynthesis" to the faux-trie? Try adding other words. And now delete "photosynthesis" but keep "photos". Play around with it, and try to relate what you're doing to the content of this article. It sounds elementary, but working with a trie is really no different than working with folders!