High # of RPC calls in GAE, is this normal?

2018-06-07 14:24:21

I have a few questions that hopefully will help solidify my understanding of the "behind the sceens" of GAE. Currently in my application I am required to retrieve a set of data that is 258 entities in size. I have set indexed=False to the appropriate properties of this entity's Model class. I am utilizing a serialize/deserialize entity technique described by Nick Johnson's blog. I made the code generic for example purposes:

def serialize_entities(models):
    if not models:
        return None
    elif isinstance(models, db.Model):
        return db.model_to_protobuf(models).Encode()
    else:
        return [db.model_to_protobuf(x).Encode() for x in models]

def deserialize_entities(data):
    if not data:
        return None
    elif isinstance(data, str):
        return db.model_from_protobuf(entity_pb.EntityProto(data))
    else:
        return [db.model_from_protobuf(entity_pb.EntityProto(x)) for x in data]

class BatchProcessor(object):
    kind = None
    filters = []
    def get_query(self):
        q = self.kind.all()
        for prop, value in self.filters:
            q.filter("%s" % prop, value)
        q.order("__key__")
        return q

    def run(self, batch_size=100):
        q = self.get_query()
        entities = q.fetch(batch_size)
        while entities:
            yield entities
            q = self.get_query()
            q.filter("__key__ >",entities[-1].key())
            entities = q.fetch(batch_size)

class MyKindHandler(webapp2.RequestHandler):
    def fill_cache(self, my_cache):
        entities = []
        if not my_cache:
            # retry in case of failure to retrieve data
            while not my_cache:
                batch_processor = BatchProcessor()
                batch_processor.kind = models.MyKind
                batch = batch_processor.run()
                my_cache = []

                for b in batch:
                    my_cache.extend(b)                    
            # Cache entities for 10 minutes
            memcache.set('MyKindEntityCache', serialize_entities(my_cache), 
                         600)
        for v in my_cache:
            # There is actually around 10 properties for this entity.
            entities.append({'one': v.one, 'two': v.two})

        mykind_json_str = simplejson.dumps({'entities':entities})
        # Don't set expiration - will be refreshed automatically when 
        # entity cache is.
        memcache.set('MyKindJsonCache', mykind_json_str)
        return mykind_json_str

    def get(self):
        my_cache = deserialize_entities(memcache.get('MyKindEntityCache'))
        if not my_cache:
            # Create entity & json cache
            self.fill_cache(my_cache)

        mykind_json_str = memcache.get('MyKindJsonCache')
        if not mykind_json_str:
            mykind_json_str = self.fill_cache(my_cache)
        self.response.write(mykind_json_str)

When I take a look at the at appstats it shows that the ajax call to the handler retrieving this data takes the most amount of time. There are 434 RPC calls made when that handler is called at the time it has to refresh the data from the datastore(every 10 minutes the entity cache expires), otherwise there are 2 RPCs called retrieving each of the memcaches. The appstats for the refresh of the data is:

real=10295ms cpu=13544ms api=5956ms overhead=137ms (434 RPCs)
datastore_v3.Get       428
memcache.Get             2
memcache.Set             2
datastore_v3.RunQuery    2

I think I understand what each of those calls are doing for the most part, but the datastore_v3.Get calls that are associated with the datastore_v3.RunQuery confuse me if they represent what I think they do. Do they mean those are individual .get() calls to the datastore? I thought that when you call .fetch(batch_size) that you are getting back a "batch" with 1 call...not only that, why would the number of these calls be 150+ more than the number of entities within the datastore for the given Kind?

As a side note, I wrote the BatchProcessor class before learning about query cursors, and have tried swapping them out to see if the performance got better, but it not only stays the same but increases the number of RPCs because of storing the cursor in memcache.

Any insight into something I may be overlooking is much appreciated!

Edit - More details from appstats:

In testing this and inspecting the appstats closer I have found that my calls to .parent() on each MyKind in the creation of the json response string are increasing the RPCs considerably, along with having 2 ReferenceProperties per MyKind that variably have or don't have a value. I use this .parent() call in order to return the Company's name in the response with the MyKind data, and if the ReferenceProperties have values I get their names as well. I did not realize that calling .parent().name, .refprop1.name, and .refprop2.name executes a separate datastore_V3.Get from the datastore for each of those. I thought the parent and reference property's data were returned with the original query for the MyKind object. Is there a way to have that happen, or will I need to create 3 other properties in MyKind to have efficient access to those name properties?

链接地址: http://www.djcxy.com/p/23218.html

上一篇: 60年代截止日期是如何应用于appengine.run和.fetch检索的？

下一篇: GAE中的高端RPC调用是否正常？