Systems Design - Pagination | TIL with Mohammad

Cursor-Based Pagination: Handling Deletions, Collisions & Secure Cursors

Pagination requires subtle implementation for a smooth infinite scroll. In this post I’ll share what I learned while reading on cursor‑based pagination for a real‑time feed – how it differs from the classic offset/limit approach, how it deals with deletions and timestamp collisions, what goes into a cursor (spoiler: it’s not a random token), why you should (or shouldn’t) protect cursors from tampering, what to cache and how to manage stale data.

Why ditch offset/limit?

The traditional approach to pagination is to supply a page and limit (or offset and limit) and let the database do the counting:

-- Page 1: first 10 entries
SELECT * FROM entries ORDER BY id ASC LIMIT 10 OFFSET 0;

-- Page 100: entries 991–1000
SELECT * FROM entries ORDER BY id ASC LIMIT 10 OFFSET 990;

Offset pagination is easy to understand and works well for small, mostly static datasets (Ex 30 countries is a query which its result doesn’t change every day).

As your table grows, performance degrades: each new page requires the database to read and discard all rows before the offset. Even worse, if new items are inserted or deleted between page requests, users will see duplicates or skipped rows.

Cursor‑based pagination avoids these problems by using a pointer rather than an offset.

Cursor‑based pagination in a nutshell

A cursor represents the last item of the previous page. When the client asks for the next page, it sends back that cursor. The server uses the cursor to continue the WHERE clause instead of calculating an offset. The result is consistent performance and stable ordering even when data changes. Here’s a basic example using a composite cursor (created_at, id):

-- First page
SELECT * FROM entries
ORDER BY created_at, id
LIMIT 20;

-- Next page (client sends cursor values from the last item)
SELECT * FROM entries
WHERE (created_at, id) > ('2025‑01‑01T10:00:00Z', 12345) 👈👈👈 You're querying for anything that's its created_at is greater than the targeted date. The targeted date is derived from the cursor. 
ORDER BY created_at, id
LIMIT 20;

This technique is sometimes called keyset pagination. Because the database seeks directly to the start point instead of scanning thousands of rows, it scales much better than offset pagination. Platforms such as GitHub, Twitter and Facebook use variants of this approach for their APIs.

Doesn’t size effect `WHERE (created_at, id) > ('2025‑01‑01T10:00:00Z', 12345` ?

It does. But not in a linear fashion. It’s because it’s using an Index. Indexes are implemented using B Trees. Finding items in a B Tree has a time complexitiy of log(n)

What happens if the cursor’s row is deleted?

One of the first questions I had was what happens if the item the cursor refers to gets deleted between page requests. With offset pagination this may break if the server calculates the next page based on the row’s position. With cursor pagination the answer is simple: nothing breaks. When the server builds the next page it uses the sort key values contained in the cursor, not the original row. If the row has been deleted, the WHERE clause still compares those values and starts from the next available record.

If the cursor points to a deleted record, “the pagination continues normally – the query WHERE id > deleted_id will start from the next available record” from — Uptrace

Handling timestamp collisions

Real‑world feeds often sort by a timestamp plus a tie‑breaker. Multiple items can share the same created_at down to the microsecond, especially at Facebook‑scale. If you only sort on created_at, the database has no deterministic way to order rows with identical timestamps and you may miss or duplicate items between pages. The fix is to add a unique tie‑breaker, typically the primary key.

Instead of WHERE created_at > …, use WHERE (created_at, id) > ('2023‑01‑01', 12345). Because Date isn’t unique we should use another field to handle ties so that “we don’t miss or duplicate items when paginating” In practice the cursor stores both the timestamp and the ID, and the ORDER BY clause uses both fields. This ensures a strict total ordering even when timestamps collide. - Upgrate

What goes into a cursor?

A common misconception is that a cursor is just an opaque random token. It’s not random. In reality, a cursor usually encodes the sort values of the last item on the previous page. Stainless’s API design guide states that

a cursor is typically a “base64‑encoded value representing the sort key and unique ID of the last item on the previous page”.

For example, a cursor could contain { "created_at": "2024‑01‑01T12:00:00Z", "id": 123 } encoded in Base64. When the client requests the next page it sends back this encoded string; the server decodes it to obtain the original sort values and uses them in the WHERE clause.

Benefits of Encoding and signing cursors

Most APIs treat the cursor as opaque by Base64‑encoding the JSON and sometimes adding a signature to prevent tampering.
Keeps the API flexible (you can change the internal structure without breaking clients)
Avoids exposing implementation details. The client simply stores the cursor string and sends it back verbatim.
Makes it URL‑safe
You can optionally sign or encrypt the encoded cursor for extra security. Signing the cursor (for example, using an HMAC with a secret key) allows the server to detect if a client has modified the cursor’s content. If the signature doesn’t match, the server can reject the request.

A signed cursor helps prevent a determined client from jumping arbitrarily forward or backward in your feed. Without a signature, a client could decode the cursor, tweak the id or created_at, re‑encode it and request any slice of your data. This defeats rate‑limiting and makes scraping much easier. Signing forces clients to use only cursors signed by your server. If you don’t care about protecting your data or the cost of random access, you might skip signing – but be aware that cursor pagination then offers no more security than offset pagination.

Why protect your cursors?

There are several reasons to validate or sign cursors instead of letting clients freely modify them:

Prevent deep scraping: Without protection, a client can iteratively adjust the cursor values and scrape your entire dataset quickly. By signing the cursor, the server ensures the client can only follow the legitimate path through the feed.
Enforce rate limits: A signed cursor forces users to page sequentially. If a user tries to jump to page 5,000 by guessing a cursor value, the signature will be invalid.
Hide internal implementation: Encoding the cursor hides internal fields (such as ranking scores, user IDs or scores) so clients can’t reverse‑engineer your ranking algorithm or infer sensitive data.
Ensure data integrity: When the cursor includes a signature, the server can detect if a client has tampered with the sort values and reject the request. You may decide not to sign cursors if you operate in a trusted environment or the data isn’t sensitive. In such cases, cursor pagination behaves similarly to offset pagination – clients can jump around freely by modifying the cursor values. The main benefit you still get is performance.

What does base-64 encoding mean?

You serialize a small object -> turn it into bytes -> then Base64-encode those bytes → producing one opaque string. Just as a demonstration:

{"t":"2025-11-06T17:00:00.000Z","id":902}

can get converted to:

eyJ0IjoiMjAyNS0xMS0wNlQxNzowMDowMC4wMDBaIiwiaWQiOjkwMn0

So client is to treat it as opaque because, when you give a client:

after=1730929182

They’ll try to guess other values, poke around, hack pagination. But if you give them:

after=eyJpZCI6OTAyLCJjcmVhdGVkQXQiOiIyMDI1LTExLTA2In0=

Clients naturally treat it as a black box. This restricsts clients from mucking with the cursor.

For a Facebook feed, should I cache the page information? How about each item in the feed? I’m asking because text, images, like count may all change.

Tbh I’m not sure. But I’d say you’d cache page respones for any cursor. The only response you shouldn’t cache is the resonse for when you don’t pass a cursor (to query the lastest feed for now) Then for each item, you’d still cache them. Then refresh them whenver your product team decides you to refresh.

Caching can be done for either purposes:

To avoid querying the data until cache is expired again. Ex: You won’t refresh until cache is expired in 20 days

To have something to show before you update the data again. Ex: You don’t have a specific date in mind. Just based on certain user actions / product decisions, you refresh the cache.

Other than that, it’s best if you decouple “showing a view” from “when query the data associated to that view”

I know you typically query data just before showing a view, you should still do that, but your architecture should allow other events to trigger a fetching of data. And your view to be responsive to when other data flows in.

Basically a screen is always subscribed to changes of the cache.
Whenever a network request returns, cache is updated.
Network request can happen from any where at any time. Examples:
1. user opened new page or some pre-fetching (upon flow or app start up).
2. user pulled down to refresh or tapped on some button.
3. some timer fired and it causes an data refresh.
4. etc.

Got it. What happens when user had the app last opened 6 days ago? Basically what do you do when your data is stale?

Great question. So at this point the user’s next cursor is pointing to something that was from six days or more.
The app at this point must not pass any cursors and just fetch the latest. Remember fetching the latest is done by not passing any cursor.
So suppose most recent page that was retrieved by the app was cursorF from page6 which was retreived six days ago. PageA - PageE haven’t been retrieved by the app yet.

PageA
PageB
PageC
PageD
PageE
PageF (6 days ago)

Here’s what happens then:

User fetches PageA. Puts that in its local storage.
User may still keep PageF in its storage. It depends on your eviction policy and what the app considers as acceptable feed. For our sake let’s say the app keeps it.
So now your dataStore = [PageA, PageF]. This does NOT mean the app will show PageA and then PageF. The app will not have a gap.
If user doesn’t scroll further down from PageA, then things are all good.
If user scrolls down past the last item within PageA, then the app will query pageB. At the point the user will see a loading state for anything that’s from pageB.

💡 You need some mechanism to know that next cursor should be fetched when you reach at the end of your current cursor and NOT at the end of all items.

To achieve that you could either do model it either:

A - Index Based

allItems = [itemA1, itemA2, itemA3, itemF1, itemF2, itemF3]
indexOfLastItemfromLastCursor = 2 (pointing to itemA3)

B - Alternate datasource

allItems = [itemA1, itemA2, itemA3, itemF1, itemF2, itemF3]
currentCursorItems = [itemA1, itemA2, itemA3] // just fetch next cursor as soon as itemA3 is about to be viewed.

C - Item based

allItems = [itemA1, itemA2, itemA3, itemF1, itemF2, itemF3]
lastItemFromCursor = itemA3 // more simpler approach

D - Remove old items. Refetch them again if needed. (not efficient)

allItems = [itemA1, itemA2, itemA3]

All approaches can work, but generally speaking ID based approaches are safer vs index based approaches. Because if something in the indexing is changed, then you can easily become off by 1 or so.

And let’s just say for the past 5 days, there were 46 new items in the feed. Each page has a size of 10. When you pull down pageA, pageB, pageC, pageD, you get 10 items for each.

For pageE you also get 10 items. The server won’t be smart and only return 6 items. It doesn’t care if you have previously downloaded 4 of the items, it’s not worth the book-keeping. The 4 items that you previously downloaded should then just get retrieved from cache.

💡 At any point of time, your items array can be stale, even if you just opened the app and scrolled down and 30 seconds passed then. If you’re following 3000 users and they make 10 actions a day then there’s an action happening every 864000 / 30000 (28.8 seconds) then just after 30 seconds it’s very likely that the tip of the feed has already changed. tldr your feed gets stale a lot quicker than 6 weeks!

I also must admit, stale data can mean either:

This message that you have, isn’t the latest. The message was changed. <– When users update their post
The stream of data that you have is valid. No single item of the feed has changed. It’s just that there are lots of new items added to the feed. <– when there are new items in the feed.

When reading things online about handling caching / stale data it’s good to know which of the above the post is focused on. They’re similar but not the same.

Image loading and Fast Scrolling

Build some mechanism in place that cancels image loading if the user was doing a fast scrolling and passing through rows quickly.

Putting it all together

To build a robust cursor‑based pagination API:

Define a stable sort order. Use immutable columns for ordering. If the primary column isn’t unique, include a secondary key like id.
Construct the cursor from the last item’s sort values. Serialize them (e.g., as JSON) and Base64‑encode to produce an opaque string. Include a signature if you need tamper protection.
Use the cursor in your next query. Decode the cursor, extract the sort values, and build the WHERE clause using (sort_key1, sort_key2) > (cursor_key1, cursor_key2) semantics.
Handle deletions gracefully. If the row referenced by the cursor is deleted, the WHERE clause still works because it uses only the sort values; the next page starts from the first record greater than the deleted one.
Validate incoming cursors. Reject malformed or even expired cursors and return an appropriate error or start from the beginning.
If the last fetched cursor is old, fetch the latest feed without passing any cursor, then resume normal pagination as the user scrolls.
Id based pointers are safer than index based pointers.
Caching is done for either purpose not making the same request again or for having a filler of old data until you get the latest.
Views should observe changes to cached. Caches should get updated whenever a network call happens. Network calls can have various triggers (other than page loading).

Conclusion

Cursor‑based pagination isn’t magic, but it can make your APIs faster and more consistent when dealing with large, mutable datasets. The key is understanding what the cursor actually represents. A good cursor encodes the exact position in your ordered data (including a tie‑breaker) and optionally carries a signature so only the server can issue valid cursors. By handling deletions gracefully, using composite keys to avoid timestamp collisions, and protecting your cursors when necessary, you can build pagination that scales nicely.

Have you implemented cursor‑based pagination in your projects? What challenges did you face? Feel free to share your experience or questions – I’d love to hear from you.

Cursor-Based Pagination: Handling Deletions, Collisions & Secure Cursors#

Why ditch offset/limit?#

Cursor‑based pagination in a nutshell#

Doesn’t size effect WHERE (created_at, id) > ('2025‑01‑01T10:00:00Z', 12345 ?#

What happens if the cursor’s row is deleted?#

Handling timestamp collisions#

What goes into a cursor?#

Benefits of Encoding and signing cursors#

Why protect your cursors?#

What does base-64 encoding mean?#

For a Facebook feed, should I cache the page information? How about each item in the feed? I’m asking because text, images, like count may all change.#

Got it. What happens when user had the app last opened 6 days ago? Basically what do you do when your data is stale?#

A - Index Based#

B - Alternate datasource#

C - Item based#

D - Remove old items. Refetch them again if needed. (not efficient)#

Image loading and Fast Scrolling#

Putting it all together#

Conclusion#

References#