Hash Tables

Why This Matters

Hash tables are the default data structure behind dictionaries, maps, sets, caches, frequency counters, symbol tables, deduplication, joins, and many indexing tasks.

They are powerful because they trade ordering for fast lookup. They are dangerous when you forget that the O(1) claim depends on hashing, collision handling, resizing, and input assumptions.

Learning Objectives

By the end, you should be able to:

Explain how a hash function maps keys to buckets.
Compare chaining, open addressing, resizing, and load-factor control.
Explain why lookup is expected O(1) but not guaranteed under every input pattern.
Choose hash tables for maps, sets, counters, caches, joins, and deduplication tasks.
Identify when a tree map or sorted index is a better fit than a hash table.

Core Idea

A hash table uses a hash function to map a key to a bucket. Collisions are normal: different keys can map to the same bucket. The table must handle collisions and resize when the load factor gets too high.

Design	How collisions work	Strength	Cost or risk
Chaining	bucket stores a list or small structure	simple deletion	pointer-heavy, cache misses
Open addressing	probe for another slot	compact memory	deletion and clustering are trickier
Resizing	allocate larger table and reinsert	keeps load factor controlled	occasional expensive operation
Hash randomization	vary hash behavior per process	helps against chosen inputs	still needs good table policy

Non-Example or Failure Mode

A hash table is a poor fit when the main query is sorted order, nearest predecessor, nearest successor, or range scan. For those queries, a balanced tree, B-tree, or sorted array may expose the structure the problem actually needs.

The failure mode is assuming "fast lookup" means "fast for every access pattern." A hash table can answer "is this exact key present?" well, but it does not preserve order unless a separate structure is added.

Worked Example

Count word frequencies in a document.

function frequencies(words: string[]) {
  const counts = new Map<string, number>();
  for (const word of words) {
    counts.set(word, (counts.get(word) ?? 0) + 1);
  }
  return counts;
}

Without a hash table, each word might require scanning the list of words already seen. That can turn counting into O(n^2) work. With a hash table under ordinary assumptions, each lookup and update is expected O(1), so the pass is expected O(n).

Builder Connections

Caches use hash tables to find entries quickly.
Tokenizers and compilers use maps for vocabularies and symbol tables.
Data pipelines use hash tables for joins, grouping, and deduplication.
Feature hashing maps high-cardinality sparse features into fixed-size vectors.
Security-sensitive services must consider hash-flooding and adversarial keys.

Common Mistakes

Mistake	Correction
Saying hash tables are always `O(1)`.	The usual claim is expected or amortized under hashing and resizing assumptions.
Confusing hashing with encryption.	Hashing for tables distributes keys; it does not hide the key or provide confidentiality.
Ignoring load factor.	High load factors increase collisions or probe lengths and can collapse performance.
Expecting sorted iteration.	Hash maps usually do not preserve sorted order; use a tree or sorted structure for range queries.
Using mutable keys.	If equality or hash behavior changes while a key is stored, lookup can become incorrect.

Diagnostic Questions

Question type	Question	Answer signal
Definition	What does the hash function output?	A deterministic bucket index or integer hash that is reduced to a bucket index.
Example / non-example	Is a sorted map the same abstraction as a hash map?	No. Both map keys to values, but sorted maps preserve order and support range-style queries.
Computation	What is the load factor after 48 occupied slots in 64 buckets?	`48 / 64 = 0.75`.
Proof	Why does resizing support amortized insertion cost?	Expensive rehashes are spread across many cheap inserts between resizes.
Transfer	Why might a cache combine a hash map with a linked list?	The map gives fast key lookup; the list maintains recency order for eviction.

Exercises

Beginner:

Insert keys 10, 17, 24, 3 into seven buckets using key mod 7.
Compute load factor for 40 stored keys in 64 buckets.
Explain why collisions are not table failure.

Intermediate:

Compare chaining and open addressing for a delete-heavy workload.
Explain why a hash table is a good fit for two-sum lookup.

Challenge:

Implement an LRU cache using a hash map plus a doubly linked list.

Diagram Recommendation

Type: bucket collision diagram.

Caption: "Keys pass through a hash function; collisions share or probe buckets."

Purpose: Make collision handling visible so learners do not treat hashing as magic.

Implementation Task

Build a small separate-chaining hash map for string keys. Include resizing when size / bucketCount > 0.75.

type Entry<V> = { key: string; value: V };

function bucketIndex(key: string, bucketCount: number) {
  let h = 0;
  for (const ch of key) h = (31 * h + ch.charCodeAt(0)) >>> 0;
  return h % bucketCount;
}

After it works, measure lookup time before and after forcing many keys into one bucket.

Next Topics

References

Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms (4th ed., 2022). Ch. 11.
MIT OpenCourseWare. 6.006 Introduction to Algorithms, Spring 2020.
Sedgewick and Wayne. Algorithms (4th ed., 2011). Ch. 3.
Skiena. The Algorithm Design Manual (3rd ed., 2020). Ch. 3.