Hierarchy Optimization on Google Appengine Datastore

I have hierarchical data stored in the datastore using a model which looks like this:

class ToolCategories(db.Model):  
   name = db.StringProperty()  
   parentKey = db.SelfReferenceProperty(collection_name="parent_category")  
   ...  
   ...  

I want to print all the category names preserving the hierarchy, say in some form like this :

--Information Gathering  
----OS Fingerprinting  
----DNS  
------dnstool  
----Port Scanning   
------windows  
--------nmap  
----DNS3  
----wireless sniffers  
------Windows  
--------Kismet  

To do the above I have used simple recursion using the back referencing capability:

class GetAllCategories (webapp.RequestHandler) :


        def RecurseList(self, object, breaks) :
                output = breaks + object.name + "</br>"
                for cat in object.parent_category:
                        output = output + self.RecurseList(cat, breaks + "--")

                return output



        def get (self) :
                output = ""
                allCategories = ToolCategories.all().filter(' parentKey = ', None)
                for category in allCategories :
                        output = output + self.RecurseList(category, "--")

                self.response.out.write(output)

As I am very new to App engine programming (hardly 3 days since I started writing code), I am not sure if this the most optimized way from the Datastore access standpoint to do the desired job.

Is this the best way? if not what is?


You have a very reasonable approach! My main caveat would be one having little to do with GAE and a lot with Python: don't build a string from pieces with + or += . Rather, you make a list of string pieces (with append or extend or list comprehensions &c) and when you're all done you join it up for the final string result with ''.join(thelist) or the like. Even though recent Python versions strive hard to optimize the intrinsically O(N squared) performance of the + or += loops, in the end you're always better off building up lists of strings along the way and ''.join ing them up at the very end!


The main disadvantage of your approach is that because you're using the "adjacency list" way of representing trees, you have to do one datastore query for each branch of the tree. Datastore queries are fairly expensive (around 160ms each), so constructing the tree, particularly if it's large, could be rather expensive).

There's another approach, which is essentially the one taken by the datastore for representing entity groups: Instead of just storing the parent key, store the entire list of ancestors using a ListProperty:

class ToolCategories(db.Model):
  name = db.StringProperty()
  parents = db.ListProperty(db.Key)

Then, to construct the tree, you can retrieve the entire thing in one single query:

q = ToolCategories.all().filter('parents =', root_key)
链接地址: http://www.djcxy.com/p/57818.html

上一篇: 如何删除Google App Engine中的所有数据存储?

下一篇: Google Appengine数据存储上的层次结构优化