B+tree lookup — how InnoDB finds a rowMySQL 8.4 • MariaDB 11.4

Clustered primary key, covering vs non-covering secondary index, and why one extra tree walk doubles your I/O.

Lesson familyIndex Access

Parameters

Logarithmic: 10, 100, 1K, 10K, 100K, 1M, 10M, 100M, 1B
BIGINT PK = 8 B. INT = 4 B. 64-char VARCHAR prefix ≈ 64 B.
Set at innodb_page_size; default 16 KiB.
Non-covering is MySQL's single biggest "secret" I/O tax.

Tree descent animation

Example query this animation executes

-- Non-covering secondary index lookup
SELECT id, first_name, last_name, country_code
FROM   users
WHERE  email = 'alice@example.com';   -- idx_users_email contains only (email, id)

Switch the lookup-type dropdown to see PK-clustered vs covering-index vs non-covering variants of this query.

What you'll see in the animation

  • The grey rectangles are B+tree pages. Top row is the root page, bottom row is the leaf.
  • A red diamond 'query token' appears above the root and descends level by level — each step is one page read.
  • When the token arrives at a page, the page pulses yellow. That's the 'page is in the buffer pool now'.
  • For non-covering secondary lookups, the secondary-tree LEAF stores the primary key (labelled 'leaf: PK = 42'). A dashed orange PK-hop arrow then links that leaf directly to the clustered-tree leaf which holds the actual row (labelled 'leaf: full row'). The token rides that arrow across — no re-descent magic, the link is a real index pointer.
  • That dashed arrow is the extra I/O a covering index would eliminate.
Adjust parameters, then press Play
0.0s 0.0s

Cost readout (updates live)

Fan-out (non-leaf) ?How many child pointers fit in one 16 KiB page. Higher fan-out means a shallower tree — fewer page reads per lookup.

Tree height ?Number of page levels from root to leaf. Each level is one disk page that InnoDB must read. A 1-billion-row table with BIGINT PK is only 4 levels deep.

Tree traversals ?A clustered-PK or covering-index lookup walks one tree. A non-covering secondary lookup walks two: the secondary index tree, then the clustered tree using the PK it found.

Pages touched ?Total page reads for this single-row lookup. Equals tree height times the number of traversals. Each page is 16 KiB by default.

Cold-cache I/O ?Worst-case: none of these pages are in the buffer pool yet, so every page is a disk read. In practice, the upper levels of the tree are almost always cached.

Complexity ?B+tree lookups are O(log n) — doubling the table size adds roughly one extra page read. Non-covering secondary adds a second tree walk: O(log users) + O(log users) = O(2 log n).

O(log users) = O(log n)

Pages touched vs table size (log–log)

Learn more — why is a non-covering secondary index slower?

InnoDB stores every table as a clustered B+tree on the primary key. The leaf pages of that tree hold the full row. A secondary index is a separate B+tree whose leaves hold (secondary_cols, primary_key), not the row itself.

So a lookup by a non-covering secondary index has to:

  1. Walk the secondary B+tree down to a leaf, yielding PK.
  2. Walk the clustered B+tree using that PK to fetch the row.

Two traversals, each the full height of its tree. That is why covering indexes — where the leaf already has every column the query asked for — are such an important tuning tool.

The fan-out formula here is (page_size − 120) / (key_size + 9). It's an approximation: real InnoDB records have variable-length headers and leaf pages split at a 15/16 fill factor, so real fan-out is a bit lower. The height is stable under that noise because it's a logarithm — halving the real fan-out only adds one level of the tree for every factor of the original fan-out.

Sources: MySQL 8.4 Reference Manual §17.6.2 (InnoDB Indexes), §17.11.2 (File Space Management). Default innodb_page_size = 16 KiB.