Cursor-Based Pagination: Handling Deletions, Collisions & Secure Cursors

Pagination requires subtle implementation for a smooth infinite scroll. In this post I’ll share what I learned while reading on cursor‑based pagination for a real‑time feed – how it differs from the classic offset/limit approach, how it deals with deletions and timestamp collisions, what goes into a cursor (spoiler: it’s not a random token), and why you should (or shouldn’t) protect cursors from tampering.

Why ditch offset/limit?

The traditional approach to pagination is to supply a page and limit (or offset and limit) and let the database do the counting:

-- Page 1: first 10 entries
SELECT * FROM entries ORDER BY id ASC LIMIT 10 OFFSET 0;

-- Page 100: entries 991–1000
SELECT * FROM entries ORDER BY id ASC LIMIT 10 OFFSET 990;

Offset pagination is easy to understand and works well for small, mostly static datasets (Ex 30 countries is a query which its result doesn’t change every day).

As your table grows, performance degrades: each new page requires the database to read and discard all rows before the offset. Even worse, if new items are inserted or deleted between page requests, users will see duplicates or skipped rows.

Cursor‑based pagination avoids these problems by using a pointer rather than an offset.

Cursor‑based pagination in a nutshell

A cursor represents the last item of the previous page. When the client asks for the next page, it sends back that cursor. The server uses the cursor to continue the WHERE clause instead of calculating an offset. The result is consistent performance and stable ordering even when data changes. Here’s a basic example using a composite cursor (created_at, id):

-- First page
SELECT * FROM entries
ORDER BY created_at, id
LIMIT 20;

-- Next page (client sends cursor values from the last item)
SELECT * FROM entries
WHERE (created_at, id) > ('2025‑01‑01T10:00:00Z', 12345) 👈👈👈 You're querying for anything that's its created_at is greater than the targeted date. The targeted date is derived from the cursor. 
ORDER BY created_at, id
LIMIT 20;

This technique is sometimes called keyset pagination. Because the database seeks directly to the start point instead of scanning thousands of rows, it scales much better than offset pagination. Platforms such as GitHub, Twitter and Facebook use variants of this approach for their APIs.

Doesn’t size effect WHERE (created_at, id) > ('2025‑01‑01T10:00:00Z', 12345 ?

It does. But not in a linear fashion. It’s because it’s using an Index. Indexes are implemented using B Trees. Finding items in a B Tree has a time complexitiy of log(n)

What happens if the cursor’s row is deleted?

One of the first questions I had was what happens if the item the cursor refers to gets deleted between page requests. With offset pagination this would break because the server calculates the next page based on the row’s position. With cursor pagination the answer is simple: nothing breaks. When the server builds the next page it uses the sort key values contained in the cursor, not the original row. If the row has been deleted, the WHERE clause still compares those values and starts from the next available record. Uptrace’s guide on cursor pagination notes that if the cursor points to a deleted record, “the pagination continues normally – the query WHERE id > deleted_id will start from the next available record”.

Handling timestamp collisions

Real‑world feeds often sort by a timestamp plus a tie‑breaker. Multiple items can share the same created_at down to the microsecond, especially at Facebook‑scale. If you only sort on created_at, the database has no deterministic way to order rows with identical timestamps and you may miss or duplicate items between pages. The fix is to add a unique tie‑breaker, typically the primary key. Uptrace’s FAQ recommends including a unique column to break ties: instead of WHERE created_at > …, use WHERE (created_at, id) > ('2023‑01‑01', 12345). Because Date isn’t unique we should use another field to handle ties so that “we don’t miss or duplicate items when paginating” In practice the cursor stores both the timestamp and the ID, and the ORDER BY clause uses both fields. This ensures a strict total ordering even when timestamps collide.

What goes into a cursor?

A common misconception is that a cursor is just an opaque random token. It’s not random. In reality, a cursor usually encodes the sort values of the last item on the previous page. Stainless’s API design guide states that

a cursor is typically a “base64‑encoded value representing the sort key and unique ID of the last item on the previous page”.

For example, a cursor could contain { "created_at": "2024‑01‑01T12:00:00Z", "id": 123 } encoded in Base64. When the client requests the next page it sends back this encoded string; the server decodes it to obtain the original sort values and uses them in the WHERE clause.
Because the cursor contains meaningful data, you must be careful to maintain the same sort order and include a tie‑breaker. If your primary sort key isn’t unique (e.g., a timestamp), add the unique secondary key. The Uptrace post emphasises this by labelling created_at alone as “BAD” and demonstrating the correct solution of including the ID.

Benefits of Encoding and signing cursors

  • Most APIs treat the cursor as opaque by Base64‑encoding the JSON and sometimes adding a signature to prevent tampering.
  • Leeps the API flexible (you can change the internal structure without breaking clients)
  • Avoids exposing implementation details. The client simply stores the cursor string and sends it back verbatim.
  • Makes it URL‑safe
  • You can optionally sign or encrypt the encoded cursor for extra security. Signing the cursor (for example, using an HMAC with a secret key) allows the server to detect if a client has modified the cursor’s content. If the signature doesn’t match, the server can reject the request.

A signed cursor helps prevent a determined client from jumping arbitrarily forward or backward in your feed. Without a signature, a client could decode the cursor, tweak the id or created_at, re‑encode it and request any slice of your data. This defeats rate‑limiting and makes scraping much easier. Signing forces clients to use only cursors signed by your server. If you don’t care about protecting your data or the cost of random access, you might skip signing – but be aware that cursor pagination then offers no more security than offset pagination.

Why protect your cursors?

There are several reasons to validate or sign cursors instead of letting clients freely modify them:

  • Prevent deep scraping: Without protection, a client can iteratively adjust the cursor values and scrape your entire dataset quickly. By signing the cursor, the server ensures the client can only follow the legitimate path through the feed.
  • Enforce rate limits: A signed cursor forces users to page sequentially. If a user tries to jump to page 5,000 by guessing a cursor value, the signature will be invalid.
  • Hide internal implementation: Encoding the cursor hides internal fields (such as ranking scores, user IDs or scores) so clients can’t reverse‑engineer your ranking algorithm or infer sensitive data.
  • Ensure data integrity: When the cursor includes a signature, the server can detect if a client has tampered with the sort values and reject the request. You may decide not to sign cursors if you operate in a trusted environment or the data isn’t sensitive. In such cases, cursor pagination behaves similarly to offset pagination – clients can jump around freely by modifying the cursor values. The main benefit you still get is performance.

What does base-64 encoding mean?

You serialize a small object -> turn it into bytes -> then Base64-encode those bytes → producing one opaque string. Just as a demonstration:

{"t":"2025-11-06T17:00:00.000Z","id":902}

can get converted to:

eyJ0IjoiMjAyNS0xMS0wNlQxNzowMDowMC4wMDBaIiwiaWQiOjkwMn0

So client is to treat it as opaque because, when you give a client:

after=1730929182

They’ll try to guess other values, poke around, hack pagination. But if you give them:

after=eyJpZCI6OTAyLCJjcmVhdGVkQXQiOiIyMDI1LTExLTA2In0=

Clients naturally treat it as a black box.

Putting it all together

To build a robust cursor‑based pagination API:

  1. Define a stable sort order. Use immutable columns for ordering. If the primary column isn’t unique, include a secondary key like id.
  2. Construct the cursor from the last item’s sort values. Serialize them (e.g., as JSON) and Base64‑encode to produce an opaque string. Include a signature if you need tamper protection.
  3. Use the cursor in your next query. Decode the cursor, extract the sort values, and build the WHERE clause using (sort_key1, sort_key2) > (cursor_key1, cursor_key2) semantics.
  4. Handle deletions gracefully. If the row referenced by the cursor is deleted, the WHERE clause still works because it uses only the sort values; the next page starts from the first record greater than the deleted one.
  5. Validate incoming cursors. Reject malformed or even expired cursors and return an appropriate error or start from the beginning.

Conclusion

Cursor‑based pagination isn’t magic, but it can make your APIs faster and more consistent when dealing with large, mutable datasets. The key is understanding what the cursor actually represents. A good cursor encodes the exact position in your ordered data (including a tie‑breaker) and optionally carries a signature so only the server can issue valid cursors. By handling deletions gracefully, using composite keys to avoid timestamp collisions, and protecting your cursors when necessary, you can build pagination that scales nicely.

Have you implemented cursor‑based pagination in your projects? What challenges did you face? Feel free to share your experience or questions – I’d love to hear from you.

References