🔗 Efficient pagination of a table with 100M records
Efficient pagination of a table with 100M recordsThis Chapter is focused on efficient scanning a large table using pagination with offset on the primary key. This is also known as keyset pagination.
The key insight is to use the primary key as the offset to avoid missing records (if they’re deleted between invocations) and to increase performance by using a RANGE join type, which is much faster (constant time.)
Simplified algorithm:
- We get
PAGE_SIZE
number of records from the table. Starting offset value is 0. - Use the max returned value for the primary key (i.e.,
user_id
) in the batch as the offset for the next page. - Get the next batch from the records which have the primary key (i.e.,
user_id
) value higher than current offset.
For example:
SELECT user_id, external_id, name, metadata, date_created
FROM users
WHERE user_id > 51 234 123 --- value of user_id for 50 000 000th record
ORDER BY user_id ASC
LIMIT 10 000;