How can I cut down the number of queries?

This code is currently executing about 50 SQL queries:

c = Category.objects.all()

categories_w_rand_books = []

for category in c:
    r = Book.objects.filter(author__category=category).order_by('?')[:5]

    categories_w_rand_books.append((category, r))

I need to cut down the number of used queries to the minimum to speed up things and do not cause server load.

Basically, I have three models: Category, Author, Book. The Author belong to the Category (not books) and I need to get a list of all categories with 5 random books under each one.


If you prefer single query and are using MySQL , check the excellent link provided by @Crazyshezy in his comment.
For PostgreSQL backends, a possible query is (assuming there are non-nullable FK relationships from Book to Author and from Author to Category ):

SELECT * FROM (
    SELECT book_table.*, row_number() OVER (PARTITION BY category_id ORDER BY RANDOM()) AS rn 
    FROM book_table INNER JOIN author_table ON book_table.author_id = author_table.id
) AS sq 
WHERE rn <= 5 

You could then wrap it inside a RawQuerySet to get Book instances

from collections import defaultdict
qs = Book.objects.raw("""The above sql suited for your tables...""")
collection = defaultdict(list)
for obj in qs:
    collection[obj.category_id].append(obj)

categories_w_rand_books = []
for category in c:
    categories_w_rand_books.append((category, collection[category.id]))

You may not want to run this query for each request directly w/o some caching.

Furthermore, your code generates at most 50*5=250 Book s, randomly, I just wonder why because it seems too many for a single page. Are items displayed as tabs or something else? Perhaps you could reduce the counts of SQLs by doing Ajax, or simplify the requirement?

Update

To use book.author w/o triggering more than another query, try prefetch_related_objects

from django.db.models.query import prefetch_related_objects
qs = list(qs) # have to evaluate at first
prefetch_related_objects(qs, ['author'])
# now instances inside qs already contain cached author instances, and
qs[0].author # will not trigger an extra query

The above code prefetches authors in batch and fills them into the qs . This just adds another query.


I'm not sure if this will help you because I don't know the details and context of your problem, but using order_by('?') is very inefficient, specially with some DB back-ends.

For displaying entities with a bit of randomness I use this approach, using a custom filter:

@register.filter
def random_iterator(list, k):
    import random
    class MyIterator:
        def __init__(self, obj, order):
            self.obj=obj
            self.cnt=0
            self.order = order

        def __iter__(self):
            return self

        def next(self):
            try:
                result=self.obj.__getitem__(self.order[self.cnt])
                self.cnt+=1
                return result
            except IndexError:
                raise StopIteration

    if list is None:
        list = []
    n = len(list)
    k = min(n, k)
    return MyIterator(list, random.sample(range(n), k))

The code in my Django view is something like this:

RAND_BOUND = 50    
categories = Category.objects.filter(......)[RAND_BOUND]

And, I use it in my template in this way:

{% for cat in categories|random_iterator:5 %}
 <li>{{ cat }}</li>             
{% endfor %}

This code will pick 5 random categories of a (reduced) set of RAND_BOUND . This is not THE perfect solution, but hope it helps.

链接地址: http://www.djcxy.com/p/11668.html

上一篇: 团队成员向git添加了数百个megs

下一篇: 我怎样才能减少查询次数?