If you spend any time reading about API design or working with APIs you will likely have come across the notion of paging response data. Paging has been used in the HTML web for many years as a method to provide users with a fast response to their searches. I normally spend my time advocating that Web APIs should emulate the HTML web more, but in this case, I believe there are better ways than slicing results into arbitrary pages of data.
Is It Necessary?
To provide some context, it is worth asking a few questions about why we do paging. On the HTML web, paging was critical because results needed to be rendered in HTML and too many results create a large HTML page. Web browsers are often slow at rendering large HTML pages and that makes users wait. Research has shown that users don’t wait.
With Web APIs, there does not need to be a direct correlation between data retrieved and data rendered to a user. What gets sent over the wire is just data and can use a more efficient format than HTML. So when do we need to start getting the server to page data that is returned? How much is too much?
Unfortunately, “how much” is one of those depends on questions. However, consider the fact that Google’s guidelines for banner ads are that they should be less than 150K. You can fit a whole lot of content in a 150K JSON payload.
What’s Wrong With Paging?
There are a few things that I don’t like about paging. From a UX perspective, if the paging mechanism does end up getting reflected in the UI it’s just not a pleasant experience. Why can’t I just scroll? If I’m looking for some specific items it is difficult because it is hard to guess which page the items I want might be on. This forces me to walk through the pages one at a time. If the data is changing while I’m paging through the data, some items may be skipped, others will be duplicated.
Whenever I see those dropdowns that ask, do you want 5, 10, or 50 items per page, I always cringe a little. But how do you determine the ideal page size? Based on what fits on the screen, or the time to transfer the data? None of those factors are fixed, so there is no good answer.
It is also important to realize that as your user is paging through the data, one page at a time, in order to improve performance, the server is having to re-execute the entire query to determine the complete set of results so that it can return just one page’s worth of data. In theory, complete results can be cached, but then you risk losing the scalability benefits of a stateless server. Making the server do significantly more work to improve client responsiveness may become a self-defeating goal.
How Can It Be Done Better?
Paging exists as a way to force the user to request a smaller subset of data. Encouraging users to return less data is a win for everyone. It’s less data for the server to process, less bandwidth, less information for the client to process, and less information for the user to hunt through, to find what they want.
However, slicing the items of data up into arbitrary-sized chunks based on some ordering algorithm is often not the most effective way of allowing users to refine their inquiry.
I find that it is always worth reviewing the characteristics of the data you are returning and asking the question, is there some natural property of the data that would be more effective at sub-dividing the data into smaller chunks?
Know Your A,B,Cs
The most obvious example is with a list of names. Many contact manager-type applications will group contacts by the first letter of either the first name or last name. Using alphabetic ordering creates 26 “pages” of data. This makes it much easier for a user to jump to the page that contains the person they are looking for.
It is true that using the alphabet to page through names, limits you to only 26 pages (assuming you don’t use two letters, which would be a bit weird), however even with just those pages, my estimate is you could still return a list of 30,000 names in a JSON document and still be smaller than the 150K banner ad. With compression, you could return far more.
It’s About Time
Alphabet-based paging is only one of many ways that data can be segregated. Time-based data is ripe for doing smart paging. Data can be paged by day, by week, by month. You can often see this mechanism on blogs. It’s easy to jump to all the posts done in a previous month.
Sometimes data has other segments like classifications, categories or geographies. The groups may not have a natural sequence, so you may have to invent one.
The important thing is that you are providing the API consumer with a way of dealing with chunks of data in more manageable sizes. Those chunks will be more meaningful in terms of the application domain and there is a reasonable chance they will be quicker to retrieve because the underlying data store may have indexes on those attributes.
From the API consumer’s perspective, one advantage of dumb paging is that it is easy to determine what page is next. A client can easily increment a numeric page value. It’s not so easy with smart paging. If your client needs to construct the link to the next page then it is going to need some smarts as to how to generate the next page URL. You may need to send the client a sequence of categories, or provide a period for time-based paging. However, if you are using a framework that generates next/previous links in the responses (like OData does) then it’s easy because the server can create the appropriate links and the client can blindly follow them to the next page.
It May Not be Possible
Sometimes data is just doesn’t have a natural grouping or the size of the groups that do exist are just too large to be useful. Arbitrary pages may be the correct approach for your scenario. My recommendation is simply to consider the more natural possibilities first before falling back on “dumb” paging.
Let Your Framework Know Who’s the Boss
All too often I see developers making design choices based on capabilities provided by their chosen framework. What many developers don’t realize is that those facilities are often provided by the framework, not because they are the best design choice, but because it was easy for the framework developers to provide it. Obviously, a framework cannot know the semantics of the data that you will be paging through, therefore it is difficult to provide a smart paging capability out of the box. However, dumb paging is easy to provide.
Make your own design choices and use framework capabilities where appropriate, don’t trust framework designers to do that work for you.