Lec 12 Join Algorithms: Nested Loop, Sort-Merge, Hash

0. Cost Model and Mental Model

The lecture studies binary inner equi-joins between tables R and S.

We use:

  • M: pages in R
  • N: pages in S
  • m: tuples in R
  • n: tuples in S
  • B: buffer pages available
  • C: average cost of one index probe

The primary cost metric is I/O count.

We ignore:

  • result output cost
  • CPU cost
  • network cost

For CPU-style reasoning, the useful baseline is:

  • naive comparison work is usually measured in tuple pairs
  • hashing is expected linear in tuples when the hash table behaves well
  • sort-merge is dominated by sorting plus one merge scan, except in duplicate-heavy worst cases

1. Join Operator Output

For matching tuples r ∈ R and s ∈ S, the operator must decide what to emit upward in the plan tree.

Early Materialization

  • Copy needed attributes into a new output tuple immediately.
  • Simpler for later operators.
  • More copying.

Late Materialization

  • Emit join keys plus record ids / tuple references.
  • Better for column stores and wide tuples.
  • Delays tuple reconstruction until later.

2. Why Join Choice Matters

R ⨝ S is far cheaper than building R × S and filtering afterward.

There is no universally best join algorithm. The correct choice depends on:

  • memory size
  • existing indexes
  • whether inputs are already sorted
  • data distribution / skew
  • which relation is smaller

In general, the smaller relation should be the outer or build side when the algorithm allows that choice.

3. Nested Loop Join Family

3.1 Naive Nested Loop Join

Algorithm:

1
2
3
4
for each tuple r in R:
for each tuple s in S:
if r and s match:
emit

Cost:

\[ M + (m \cdot N) \]

Tuple-comparison complexity:

\[ \Theta(mn) \]

Why it is bad:

  • for every tuple in R, scan all pages of S
  • good only as the conceptual baseline

3.2 Block Nested Loop Join

Improve the idea by loading a block of the outer relation into memory.

With B buffers:

  • use B-2 buffers for the outer block
  • use 1 buffer for the inner relation scan
  • use 1 output buffer

Cost:

\[ M + \left\lceil \frac{M}{B-2} \right\rceil \cdot N \]

Tuple-comparison work is still:

\[ \Theta(mn) \]

The gain is not fewer comparisons; the gain is far fewer rescans of the inner relation on disk.

Key rules:

  • choose the smaller relation by pages, not tuples, as outer
  • if the outer fits in memory, cost approaches M + N

This is the first nested-loop variant that is often reasonable.

Feasibility shortcut:

\[ M \leq B-2 \implies \text{outer fits in memory} \]

and the I/O cost then becomes about:

\[ M + N \]

3.3 Index Nested Loop Join

If the inner table has a useful index on the join key:

1
2
3
for each tuple r in R:
probe index on S using join key of r
emit matches

Cost:

\[ M + (m \cdot C) \]

CPU-style view:

  • one outer scan
  • one index probe per outer tuple
  • so the dominant work is roughly m probes rather than mn tuple comparisons

This can be excellent when:

  • the index already exists
  • probes are selective / cheap

It can be terrible if the index probes are random and expensive.

4. Sort-Merge Join

Phase 1: Sort

  • Sort both inputs on the join key.
  • Any suitable sorting algorithm can be used, including external merge sort.

Phase 2: Merge

  • Scan the sorted inputs together.
  • Advance the lower key.
  • When keys match, emit all matching combinations.

Cost

If both sides must be sorted first:

\[ \text{Sort}(R) = 2M \cdot \left(1 + \left\lceil \log_{B-1}\left\lceil \frac{M}{B} \right\rceil \right\rceil \right) \]

\[ \text{Sort}(S) = 2N \cdot \left(1 + \left\lceil \log_{B-1}\left\lceil \frac{N}{B} \right\rceil \right\rceil \right) \]

\[ \text{Merge Cost} = M + N \]

Total cost:

\[ \text{Sort}(R) + \text{Sort}(S) + (M+N) \]

Expected CPU complexity, ignoring duplicate blow-up, is roughly:

\[ O(m \log m + n \log n + m + n) \]

When Sort-Merge Is Attractive

  • one or both inputs are already sorted on the join key
  • the output must also be sorted on that key
  • sorting can be shared with another operator

Important Caveat

Worst case happens when many tuples share the same join key on both sides.

Then the merge phase can degrade toward:

\[ (M \cdot N) + \text{sort cost} \]

So duplicate-heavy joins can make sort-merge much less attractive than its clean scan pattern suggests.

5. Hash Join

Hash join relies on one observation:

if r and s join on the same key, then hashing that key sends them to the same bucket / partition.

5.1 Simple Hash Join

Build Phase

  • Scan the smaller relation.
  • Insert tuples into an in-memory hash table using the join key.
  • In practice, simple open-addressing schemes like linear probing work well.

Probe Phase

  • Scan the other relation.
  • Hash its join key.
  • Probe the hash table for matches.

This is excellent when the build side fits in memory.

Memory rule of thumb:

\[ \min(M, N) \lesssim B-2 \]

ignoring hash-table overhead and output buffers.

Expected CPU complexity:

\[ O(m + n) \]

Worst case can still degrade badly with pathological collisions or extreme skew.

Probe Filter Optimization

While building the hash table, also build a Bloom filter:

  • check the Bloom filter before probing the real hash table
  • avoid many useless hash-table probes
  • classic example of sideways information passing

5.2 Partitioned Hash Join (GRACE Hash Join)

When the build side does not fit in memory, partition both relations first.

Partition Phase

  • Hash R into k buckets on disk.
  • Hash S into k buckets with the same function.
  • Spill buckets as they fill.

Probe Phase

For each pair of corresponding partitions:

  1. build an in-memory hash table on the smaller partition
  2. scan the other partition and probe it

If no recursive repartitioning is needed, cost is:

\[ 3(M + N) \]

Why:

  • partition phase reads and writes both relations: 2(M+N)
  • probe phase reads both partitioned relations once: M+N

A useful fit condition for the build side is:

\[ \left\lceil \frac{\min(M, N)}{B-1} \right\rceil \leq B-2 \]

which says that after partitioning, each build partition should fit in memory.

Expected tuple work remains close to linear:

\[ O(m + n) \]

as long as partitioning is balanced and recursion is not excessive.

5.3 Edge Cases and Optimizations

Recursive Partitioning

If a partition is still too large to fit in memory:

  • repartition it with another hash function
  • repeat until the build partition fits

Skew / Hot Keys

If one join key is so frequent that its partition still will not fit:

  • use a fallback like block nested loop join just for that heavy key
  • trade random I/O for sequential I/O

Hybrid Hash Join

If memory is larger than the minimum needed for partitioning:

  • keep one partition in memory during the first phase
  • join it immediately
  • spill only the remaining partitions

This saves some I/O compared with plain partitioned hash join.

Practical Observation

The probe relation can be any size. What must fit is the build hash table, or at least one build partition at a time.

6. Selection Guide

Situation Good First Choice Why
No index, little memory, baseline fallback Block nested loop simple and predictable
Inner side has useful index Index nested loop avoids full scans of inner relation
Inputs already sorted or output needs sorted Sort-merge join reuse ordering
Smaller side fits in memory Simple hash join usually very fast
Both sides large, no ordering needed Partitioned hash join strong I/O behavior

7. Complexity Summary

Algorithm CPU / Tuple Complexity I/O Cost
Naive nested loop \Theta(mn) M + (m \cdot N)
Block nested loop \Theta(mn) M + \left\lceil \frac{M}{B-2} \right\rceil \cdot N
Index nested loop about m index probes M + (m \cdot C)
Sort-merge join O(m \log m + n \log n + m + n) expected \text{Sort}(R) + \text{Sort}(S) + (M+N)
Sort-merge worst case up to O(mn) duplicate handling (M \cdot N) + \text{sort cost}
Simple hash join expected O(m+n) scan + build/probe, requires build side in memory
Partitioned hash join expected O(m+n) 3(M+N) if no recursive repartitioning

8. Quick Recap

  1. Naive nested loop join is mostly a teaching baseline.
  2. Block nested loop join becomes practical once the outer relation can be buffered.
  3. Index nested loop join is only as good as the index probes.
  4. Sort-merge join is attractive when sorting is already useful elsewhere.
  5. Hash join is the default workhorse for large equi-joins, especially when ordering is unnecessary.