Big datastore reads

I need to read all the entries in a Google AppEngine datastore to do some initialization work. There are a lot of entities (80k currently) and this continues to grow. I'm starting to hit the 30 second datastore query timeout limit.

Are there any best practices for how to shard these types of huge reads in the datastore? Any examples?


You can tackle this in several ways:

  • Execute your code on Task Queue which has 10min timeout instead of 30s (more like 60s in practice). The easiest way to do this is via DeferredTask .

    Warning : DeferredTask must be serializable, so it's hard to pass it complex data. Also dont make it an inner class.

  • See backends. Requests served by backend instance do not have time limit.

  • Finally, if you need to break-up a big task and execute in parallel than look at mapreduce.


  • This answer on StackExchange served me well:

    Expired queries and appengine

    I had to slightly modify it to work for me:

    def loop_over_objects_in_batches(batch_size, object_class, callback):
    
        num_els = object_class.count() 
        num_loops = num_els / batch_size
        remainder = num_els - num_loops * batch_size
        logging.info("Calling batched loop with batch_size: %d, num_els: %s, num_loops: %s, remainder: %s, object_class: %s, callback: %s," % (batch_size, num_els, num_loops, remainder, object_class, callback))    
        offset = 0
        while offset < num_loops * batch_size:
            logging.info("Processing batch (%d:%d)" % (offset, offset+batch_size))
            query = object_class[offset:offset + batch_size]
            for q in query:
                callback(q)
    
            offset = offset + batch_size
    
        if remainder:
            logging.info("Processing remainder batch (%d:%d)" % (offset, num_els))
            query = object_class[offset:num_els]
            for q in query:
                callback(q)
    
    链接地址: http://www.djcxy.com/p/57832.html

    上一篇: Appengine搜索API vs数据存储

    下一篇: 大数据存储读取