Why This Matters
Hash tables are the default data structure behind dictionaries, maps, sets, caches, frequency counters, symbol tables, deduplication, joins, and many indexing tasks.
They are powerful because they trade ordering for fast lookup. They are dangerous when you forget that the O(1) claim depends on hashing, collision handling, resizing, and input assumptions.
Learning Objectives
By the end, you should be able to:
- Explain how a hash function maps keys to buckets.
- Compare chaining, open addressing, resizing, and load-factor control.
- Explain why lookup is expected
O(1)but not guaranteed under every input pattern. - Choose hash tables for maps, sets, counters, caches, joins, and deduplication tasks.
- Identify when a tree map or sorted index is a better fit than a hash table.
Core Idea
A hash table uses a hash function to map a key to a bucket. Collisions are normal: different keys can map to the same bucket. The table must handle collisions and resize when the load factor gets too high.
| Design | How collisions work | Strength | Cost or risk |
|---|---|---|---|
| Chaining | bucket stores a list or small structure | simple deletion | pointer-heavy, cache misses |
| Open addressing | probe for another slot | compact memory | deletion and clustering are trickier |
| Resizing | allocate larger table and reinsert | keeps load factor controlled | occasional expensive operation |
| Hash randomization | vary hash behavior per process | helps against chosen inputs | still needs good table policy |
Non-Example or Failure Mode
A hash table is a poor fit when the main query is sorted order, nearest predecessor, nearest successor, or range scan. For those queries, a balanced tree, B-tree, or sorted array may expose the structure the problem actually needs.
The failure mode is assuming "fast lookup" means "fast for every access pattern." A hash table can answer "is this exact key present?" well, but it does not preserve order unless a separate structure is added.
Worked Example
Count word frequencies in a document.
function frequencies(words: string[]) {
const counts = new Map<string, number>();
for (const word of words) {
counts.set(word, (counts.get(word) ?? 0) + 1);
}
return counts;
}
Without a hash table, each word might require scanning the list of words already seen. That can turn counting into O(n^2) work. With a hash table under ordinary assumptions, each lookup and update is expected O(1), so the pass is expected O(n).
Builder Connections
- Caches use hash tables to find entries quickly.
- Tokenizers and compilers use maps for vocabularies and symbol tables.
- Data pipelines use hash tables for joins, grouping, and deduplication.
- Feature hashing maps high-cardinality sparse features into fixed-size vectors.
- Security-sensitive services must consider hash-flooding and adversarial keys.
Common Mistakes
| Mistake | Correction |
|---|---|
Saying hash tables are always O(1). | The usual claim is expected or amortized under hashing and resizing assumptions. |
| Confusing hashing with encryption. | Hashing for tables distributes keys; it does not hide the key or provide confidentiality. |
| Ignoring load factor. | High load factors increase collisions or probe lengths and can collapse performance. |
| Expecting sorted iteration. | Hash maps usually do not preserve sorted order; use a tree or sorted structure for range queries. |
| Using mutable keys. | If equality or hash behavior changes while a key is stored, lookup can become incorrect. |
Diagnostic Questions
| Question type | Question | Answer signal |
|---|---|---|
| Definition | What does the hash function output? | A deterministic bucket index or integer hash that is reduced to a bucket index. |
| Example / non-example | Is a sorted map the same abstraction as a hash map? | No. Both map keys to values, but sorted maps preserve order and support range-style queries. |
| Computation | What is the load factor after 48 occupied slots in 64 buckets? | 48 / 64 = 0.75. |
| Proof | Why does resizing support amortized insertion cost? | Expensive rehashes are spread across many cheap inserts between resizes. |
| Transfer | Why might a cache combine a hash map with a linked list? | The map gives fast key lookup; the list maintains recency order for eviction. |
Exercises
Beginner:
- Insert keys
10, 17, 24, 3into seven buckets usingkey mod 7. - Compute load factor for 40 stored keys in 64 buckets.
- Explain why collisions are not table failure.
Intermediate:
- Compare chaining and open addressing for a delete-heavy workload.
- Explain why a hash table is a good fit for two-sum lookup.
Challenge:
- Implement an LRU cache using a hash map plus a doubly linked list.
Diagram Recommendation
Type: bucket collision diagram.
Caption: "Keys pass through a hash function; collisions share or probe buckets."
Purpose: Make collision handling visible so learners do not treat hashing as magic.
Implementation Task
Build a small separate-chaining hash map for string keys. Include resizing when size / bucketCount > 0.75.
type Entry<V> = { key: string; value: V };
function bucketIndex(key: string, bucketCount: number) {
let h = 0;
for (const ch of key) h = (31 * h + ch.charCodeAt(0)) >>> 0;
return h % bucketCount;
}
After it works, measure lookup time before and after forcing many keys into one bucket.
Next Topics
References
- Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms (4th ed., 2022). Ch. 11.
- MIT OpenCourseWare. 6.006 Introduction to Algorithms, Spring 2020.
- Sedgewick and Wayne. Algorithms (4th ed., 2011). Ch. 3.
- Skiena. The Algorithm Design Manual (3rd ed., 2020). Ch. 3.