Parameters (MySQL 8.4 hash join: employees ⋈ departments)
Example query this animation executes
-- Non-indexed equi-join executed by MySQL 8.4 hash join SELECT e.name, d.name AS department FROM employees e JOIN departments d ON e.dept_id = d.id WHERE e.active = 1; -- d.id has no usable index → hash join
MySQL picks the smaller input (departments) as the build side. The larger input (employees) is streamed through in one pass.
What you'll see in the animation — with real data
- Phase 1 — build: 6 department rows fly from the left into hash buckets. Each pill is labelled with the actual row (e.g. 'id=3 Eng'). The hash function decides which bucket: hash(dept.id) % 6. After the build, each bucket shows the department rows it holds (e.g. bucket [3] holds 'Eng, HR').
- Phase 2 — probe: 8 employee rows stream from the right. Each pill is labelled (e.g. 'Alice dept=3'). MySQL computes hash(3) % 6 = bucket [3], and looks inside that bucket for a department row with id=3.
- When a match is found — e.g. Alice's dept_id=3 matches 'Eng' (id=3) in bucket [3] — the bucket flashes green with 'MATCH ✓'. That joined pair (Alice + Eng) is sent to the client.
- Two rows land in the same bucket when hash(key) produces the same index. But same bucket ≠ same key — MySQL still checks actual values. The hash table just narrows the search from 'scan all 6 departments' to 'check 1 or 2 rows in one bucket'.
- Total work: one pass through departments (build) + one pass through employees (probe) = O(n + m). If the hash table exceeds join_buffer_size, a red spill banner appears.
Cost readout (MySQL 8.4 hash join)
Build-side memory ?How much RAM the hash table needs. MySQL picks the smaller input as the build side and hashes it into join_buffer_size. Includes ~40% overhead for bucket chains.
—
Fits in join_buffer_size? ?If Yes, everything runs in memory — fast single-pass. If No, MySQL spills both inputs to disk and re-reads them, roughly doubling the I/O.
—
Spilled to disk? ?When the build side is too big, MySQL partitions both inputs to disk files, then re-reads each partition for build + probe. This is called grace-hash.
—
Partitions ?Number of on-disk chunks when spilling. Each partition's build side must fit in join_buffer_size. More partitions = more disk I/O passes.
—
Phases ?2 phases when in-memory (build + probe). 4 when spilling (partition-write + partition-read + build + probe).
—
Complexity ?O(departments + employees) = O(n + m) when in-memory. O(2·(n + m)) when spilling to disk.
O(depts + emps) = O(n + m)
Row comparisons vs probe size (log–log, build side fixed)
Learn more — what happens when the build side doesn't fit?
MySQL 8.4 allocates the hash table out of join_buffer_size.
If the whole build side fits, you get a single-pass hash join:
phase 1 builds the hash table, phase 2
streams the probe rows through and emits matches. O(build + probe).
If it doesn't fit, MySQL falls back to grace hash: both inputs are partitioned by a hash function onto disk, one file per partition. Each partition is then built + probed independently. Total I/O roughly doubles (write both inputs, read both inputs again), and the complexity grows to O(2·(build + probe)).
Why a hash table? Without one, matching Alice (dept_id=3) against departments would require scanning all 6 rows — O(n). With the hash table, you compute hash(3) % 6 → bucket [3] and check only the 1–2 rows in that bucket. That's O(1) per probe row, which is what makes the whole join O(n + m).
Source: MySQL 8.4 Reference Manual §10.2.1.4 "Hash Join Optimization".