Datastore fetch VS fetch(keys

I am fetching multiple entities 100+ from datastore using the below Query

return entity.query(ancestor = ancestorKey).filter(entity.year= myStartYear).order(entity.num).fetch()

Which was taking a long time (order of a few seconds) to load.

Trying to find an optimum way, I created exactly 100 entities, found that it takes anywhere between 750ms ~ 1000ms to fetch the 100 entities on local server, which is a lot of course. I am not sure how to get around a single line fetch to make it more efficient!

In a desperate attempt to optimize, I tried

  • Removing the order part, still got the same results
  • Removing the filter part, still got the same results
  • Removing the order & filter part, still got the same results
  • So apparently it is something else. In a desperate attempt, I tried fetching for keys only then passing the keys to ndb.get_multi() function:

    qKeys = entity.query(ancestor = ancestorKey).filter(entity.year= myStartYear).order(entity.num).fetch(keys_only=True)

    return ndb.get_multi(qKeys)

    To my surprise I get a better throughput! query results now loads in 450 ~ 550ms which is around ~40% better performance on average!

    I am not sure why this happens, I would have thought that the fetch function already queries entities in the most optimum time.

    Question: Any idea how I can optimize the single query line to load faster?

    Side Question: Anyone knows what's the underlying mechanism for the fetch function, and why fetching keys only, then using ndb.get_multi() is faster?


    FWIW, you shouldn't expect meaningful results from datastore performance tests performed locally, using either the development server or the datastore emulator - they're just emulators, they don't have the same performance (or even the 100% equivalent functionality) as the real datastore.

    Credit goes to @snakecharmerb, who correctly identified the culprit, confirmed by OP:

    Be aware that performance characteristics in the cloud may differ from those on your local machine. You really want to be running these tests in the cloud. – snakecharmerb yesterday

    @snakecharmerb you were right on your suggestion! Just tested on the cloud it's actually the other way around on the cloud in terms of performance. fetch() ~550ms, fetch(keysonly) then get_multi was ~700ms seems that fetch() works better on the cloud! – Khaled yesterday

    链接地址: http://www.djcxy.com/p/23226.html

    上一篇: 如何确定一个Python变量的类型?

    下一篇: 数据存储获取VS获取(键