Avoiding multiple references to the same object in Django ORM

We have an app with highly interrelated data, ie there are many cases where two objects might refer to the same object via a relationship. As far as I can tell, Django does not make any attempt to return a reference to an already-fetched object if you attempt to fetch it via a different, previously unevaluated relationship.

For example:

class Customer( Model ):
    firstName = CharField( max_length = 64 )
    lastName = CharField( max_length = 64 )

class Order( Model ):
    customer = ForeignKey( Customer, related_name = "orders" )

Then assume we have a single customer who has two orders in the DB:

order1, order2 = Order.objects.all()
print order1.customer # (1) One DB fetch here
print order2.customer # (2) Another DB fetch here
print order1.customer == order2.customer # (3) True, because PKs match
print id( order1.customer ) == id( order2.customer ) # (4) False, not the same object

When you have highly interrelated data, the degree to which accessing relationships of your objects results in repeated queries of the DB for the same data increases and becomes a problem.

We also program for iOS and one of the nice things about CoreData is that it maintains context, so that in a given context there is only ever one instance of a given model. In the example given above, CoreData would not have done the second fetch at (2), because it would have resolved the relationship using the customer already in memory.

Even if line (2) was replaced with a spurious example designed to force another DB fetch (like print Order.objects.exclude( pk = order1.pk ).get( customer = order1.customer ) ), CoreData would realize that the result of that second fetch resolved to an model in memory and return the existing model instead of a new one (ie (4) would print True in CoreData because they would actually be the same object).

To hedge against this behaviour of Django, we are kinda writing all this horrible stuff to try to cache models in memory by their (type, pk) and then check relationships with the _id suffix to try to pull them from the cache before blindly hitting the DB with another fetch. This is cutting down on DB throughput but feels really brittle and likely to cause problems if normal relationship lookups via properties accidentally happen in some contrib framework or middleware that we don't control.

Are there any best practices or frameworks out there for Django to help avoid this problem? Has anyone attempted to install some kind of thread-local context into Django's ORM to avoid repeat lookups and having multiple in-memory instances mapping to the same DB model?

I know that query-caching stuff like JohnnyCache is out there (and helps cut down on the DB throughput) however there is still the issue of multiple instances mapping to the same underlying model even with those measures in place.


大卫克莱默的django-id-mapper是这样做的一个尝试。


There's a relevant DB optimization page in django documentation; basically callables are not cached, but attributes are (subsequent calls to order1.customer don't hit the database), though only in the context of their object owner (so, not sharing among different orders).

using cache

As you say, one way to solve your problem is to use a database cache. We use bitbucket's johnny cache, which is almost completely transparent; another good transparent one is mozilla's cache machine. You also have the choice for less-transparent caching systems that might actually better fit the bill, please see djangopackages/caching.

Adding a cache can indeed be very beneficial if different requests need to re-use the same Customer; but please read this wich applies to most transparent cache systems to think through if your Write/Read pattern suits such a caching system.

optimizing the requests

Another approach for your precise example is to use select_related .

order1, order2 = Order.objects.all().select_related('customer')

This way the Customer object will be loaded straight away in the same sql request, with little cost (unless it's a very big record) and no need to experiment with other packages.

链接地址: http://www.djcxy.com/p/10332.html

上一篇: 如何在深度优先搜索中正确标记树的分支

下一篇: 避免在Django ORM中多次引用同一对象