API pagination best practices

I'd love some some help handling a strange edge case with a paginated API I'm building.

Like many APIs, this one paginates large results. If you query /foos, you'll get 100 results (ie foo #1-100), and a link to /foos?page=2 which should return foo #101-200.

Unfortunately, if foo #10 is deleted from the data set before the API consumer makes the next query, /foos?page=2 will offset by 100 and return foos #102-201.

This is a problem for API consumers who are trying to pull all foos - they will not receive foo #101.

What's the best practice to handle this? We'd like to make it as lightweight as possible (ie avoiding handling sessions for API requests). Examples from other APIs would be greatly appreciated!


I'm not completely sure how your data is handled, so this may or may not work, but have you considered paginating with a timestamp field?

When you query /foos you get 100 results. Your API should then return something like this (assuming JSON, but if it needs XML the same principles can be followed):

{
    "data" : [
        {  data item 1 with all relevant fields    },
        {  data item 2   },
        ...
        {  data item 100 }
    ],
    "paging":  {
        "previous":  "http://api.example.com/foo?since=TIMESTAMP1" 
        "next":  "http://api.example.com/foo?since=TIMESTAMP2"
    }

}

Just a note, only using one timestamp relies on an implicit 'limit' in your results. You may want to add an explicit limit or also use an until property.

The timestamp can be dynamically determined using the last data item in the list. This seems to be more or less how Facebook paginates in its Graph API (scroll down to the bottom to see the pagination links in the format I gave above).

One problem may be if you add a data item, but based on your description it sounds like they would be added to the end (if not, let me know and I'll see if I can improve on this).


You have several problems.

First, you have the example that you cited.

You also have a similar problem if rows are inserted, but in this case the user get duplicate data (arguably easier to manage than missing data, but still an issue).

If you are not snapshotting the original data set, then this is just a fact of life.

You can have the user make an explicit snapshot:

POST /createquery
filter.firstName=Bob&filter.lastName=Eubanks

Which results:

HTTP/1.1 301 Here's your query
Location: http://www.example.org/query/12345

Then you can page that all day long, since it's now static. This can be reasonably light weight, since you can just capture the actual document keys rather than the entire rows.

If the use case is simply that your users want (and need) all of the data, then you can simply give it to them:

GET /query/12345?all=true

and just send the whole kit.


If you've got pagination you also sort the data by some key. Why not let API clients include the key of the last element of the previously returned collection in the URL and add a WHERE clause to your SQL query (or something equivalent, if you're not using SQL) so that it returns only those elements for which the key is greater than this value?

链接地址: http://www.djcxy.com/p/45658.html

上一篇: 如何创建添加了过滤器的REST API调用

下一篇: API分页最佳实践