This Chapter is focused on efficient scanning a large table using pagination
with offset on the primary key. This is also known as keyset pagination.
Efficient pagination of a table with 100M recordsThe key insight is to use the primary key as the offset to avoid missing records (if they’re deleted between invocations)
and to increase performance by using a RANGE join type,
which is much faster (constant time.)
Simplified algorithm:
- We get
PAGE_SIZE
number of records from the table. Starting offset value is 0. - Use the max returned value for the primary key (i.e.,
user_id
) in the batch as the offset for the next page. - Get the next batch from the records which have the primary key (i.e.,
user_id
) value higher than current offset.
For example:
SELECT user_id, external_id, name, metadata, date_created
FROM users
WHERE user_id > 51 234 123 --- value of user_id for 50 000 000th record
ORDER BY user_id ASC
LIMIT 10 000;