How can I improve this query depending on the explain results

I have The Following query:

SELECT DISTINCT f1.match_static_id,
                f2.comments_no,
                f2.maxtimestamp,
                users.username,
                users.id,
                matches_of_comments.localteam_name,
                matches_of_comments.visitorteam_name,
                matches_of_comments.localteam_goals,       
                matches_of_comments.visitorteam_goals,
                matches_of_comments.match_status,
                new_iddaa.iddaa_code
FROM comments AS f1
INNER JOIN (
             SELECT match_static_id,
                    MAX( TIMESTAMP ) maxtimestamp,
                    COUNT( match_static_id ) AS comments_no
             FROM comments
             GROUP BY match_static_id
          ) AS f2 ON f1.match_static_id = f2.match_static_id 
                  AND f1.timestamp = f2.maxtimestamp
INNER JOIN users ON users.id = f1.user_id
INNER JOIN matches_of_comments ON matches_of_comments.match_id = f2.match_static_id
LEFT JOIN new_iddaa ON new_iddaa.match_id = matches_of_comments.match_id
WHERE matches_of_comments.flag =1
ORDER BY f2.maxtimestamp DESC

This is the EXPLAIN plan for that query :

+----+-------------+---------------------+--------+-----------------------------------+-----------+---------+------------------------------------------+-------+------------------------------------------------+
| id | select_type |        table        |  type  |           possible_keys           |    key    | key_len |                   ref                    | rows  |                     extra                      |
+----+-------------+---------------------+--------+-----------------------------------+-----------+---------+------------------------------------------+-------+------------------------------------------------+
|  1 | PRIMARY     | <derived2>          | ALL    | NULL                              | NULL      | NULL    | NULL                                     |   542 | Using temporary; Using filesort                |
|  1 | PRIMARY     | f1                  | ref    | timestamp,match_static_id,user_id | timestamp | 4       | f2.maxtimestamp                          |     1 | Using where                                    |
|  1 | PRIMARY     | users               | eq_ref | PRIMARY                           | PRIMARY   | 4       | skormix_db1.f1.user_id                   |     1 |                                                |
|  1 | PRIMARY     | matches_of_comments | ALL    | match_id                          | NULL      | NULL    | NULL                                     | 20873 | Range checked for each record (index map: 0x8) |
|  1 | PRIMARY     | new_iddaa           | ref    | match_id                          | match_id  | 4       | skormix_db1.matches_of_comments.match_id |     1 |                                                |
|  2 | DERIVED     | comments            | ALL    | NULL                              | NULL      | NULL    | NULL                                     |   933 | Using temporary; Using filesort                |
+----+-------------+---------------------+--------+-----------------------------------+-----------+---------+------------------------------------------+-------+------------------------------------------------+

I use this query to get a match information if this match has at least one comment.
I get the names of the teams , the code (iddaa code), the number of comments, the timstamp of the last commrnt, the author of the last comment.
I have a big database and it is expected to be larger in the next few monthes and I am very new with the MySQL queries and I want to be sure that I am using the optimize queries from the beginning so I want to know how to read this explain information to make the query better and faster.

I see that there are a lot of places in the table that does not use the indexes eventhough i built them.
I also see derived in the table column and I do not know how to make this query more fast and how to get rid of the filesort because I can not make indexes for the derived queries??

I write down the structure of the using tables in the query with the indexes (keys) and I hope to get some hints or simple answers for my questions , thanks in advance .

Comments (f1) table structure is :

CREATE TABLE `comments` (
 `id` int(25) NOT NULL AUTO_INCREMENT,
 `comments` text COLLATE utf8_unicode_ci NOT NULL,
 `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `date` date NOT NULL,
 `time` time NOT NULL,
 `match_static_id` int(25) NOT NULL,
 `ip` varchar(255) CHARACTER SET latin1 NOT NULL,
 `comments_yes_or_no` int(25) NOT NULL,
 `user_id` int(25) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `timestamp` (`timestamp`),
 KEY `match_static_id` (`match_static_id`),
 KEY `user_id` (`user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=935 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

users table structure is:

CREATE TABLE `users` (
 `id` int(25) NOT NULL AUTO_INCREMENT,
 `username` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `password` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `email` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `gender` int(25) NOT NULL,
 `first_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `last_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `avatar` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `alert` int(25) NOT NULL,
 `daily_tahmin` int(25) NOT NULL,
 `monthly_tahmin` int(25) NOT NULL,
 `admin` int(25) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=995 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

matches_of_comments_ structure is :

CREATE TABLE `matches_of_comments` (
 `id` int(25) NOT NULL AUTO_INCREMENT,
 `en_tournament_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `tournament_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `country_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `match_status` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `match_time` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `match_date` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `static_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `fix_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `match_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `localteam_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `localteam_goals` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `localteam_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `visitorteam_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `visitorteam_goals` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `visitorteam_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `ht_score` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `flag` int(25) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `match_status` (`match_status`),
 KEY `match_date` (`match_date`),
 KEY `match_id` (`match_id`),
 KEY `localteam_id` (`localteam_id`),
 KEY `visitorteam_id` (`visitorteam_id`),
 KEY `flag` (`flag`)
) ENGINE=MyISAM AUTO_INCREMENT=237790 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

new_iddaa table structure is :

CREATE TABLE `new_iddaa` (
 `id` int(25) NOT NULL AUTO_INCREMENT,
 `match_id` int(25) NOT NULL,
 `iddaa_code` int(25) NOT NULL,
 `tv_channel` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
 `skormix_tahmin` varchar(255) CHARACTER SET utf8 NOT NULL,
 PRIMARY KEY (`id`),
 KEY `match_id` (`match_id`)
) ENGINE=MyISAM AUTO_INCREMENT=8191 DEFAULT CHARSET=latin1

Starting with the more pressing issues, before discussing options.


The first immediate problem is:

SELECT DISTINCT …

A select distinct is slow. Very, very slow: it basically compares each field of each row returned by your set. There naturally is room for optimization when there's an ID in there that is guaranteed to be unique per row, but your own query doesn't look like it offers any such possibility: at best a tuple from matches_of_comments and new_iddaa .

To work around this, break the query in two or more parts, and only fetch what is actually needed for what you're doing. This seems to be ordering matches_of_comments by their latest comment date, and then fetching extra cosmetic data from users and new_iddaa .

The next one is the biggest problem imho:

INNER JOIN (
         SELECT match_static_id,
                MAX( TIMESTAMP ) maxtimestamp,
                COUNT( match_static_id ) AS comments_no
         FROM comments
         GROUP BY match_static_id
      ) AS f2 ON f1.match_static_id = f2.match_static_id 
              AND f1.timestamp = f2.maxtimestamp

You're joining an aggregate with a table on a (match_static_id, timestamp) tuple that has no index on it, and fetching a huge set at that. You've a guaranteed merge join down the road — not what you want.

The last eye popping issue is:

ORDER BY f2.maxtimestamp DESC

First off, you've no limit there. This means you're going to build, sort and return an enormous set. Surely you're paginating this data, so do so in the query by adding a limit clause.

Once you do add a limit, you need to consider what is adding extra rows, and how they should be ordered. Based on your schema, I imagine new_iddaa does. Are you paginating things in such a way that the latter information needs to be part of that query and the number of rows it returns? I imagine not, since you're not evidently interested in how these rows are sorted.

After scanning your schema, this additional one pops out:

`match_id` varchar(255)

The rows that reference this are integers, right? So it should be an integer as well, to avoid the overhead of casting varchars to ints or vice-versa, and to allow the use of indexes in either case.

While not relevant to this particular query, the following two fields also need attention and proper casting:

`tournament_id` varchar(255)
`match_time` varchar(255)
`match_date` varchar(255)
`static_id` varchar(255)
`fix_id` varchar(255)
`localteam_id` varchar(255)
`visitorteam_id` varchar(255)

Onto improving the query…

As I read it, you're ordering matches_of_comments by latest comment. You also want the number of comments, so we start by doing that. Assuming you're paginating the first 10 of many, the query goes like this:

SELECT match_static_id,
       MAX( TIMESTAMP ) maxtimestamp,
       COUNT( match_static_id ) AS comments_no
FROM comments
GROUP BY match_static_id
ORDER BY maxtimestamp DESC
LIMIT 10 OFFSET 0

That's all.

It gives you 10 IDs — more if you increase the limit. Loop through them in your app and build an in (…) clause that will allow you to fetch each individual bit of data from the other tables as needed; you can do this with one or several queries, it matters little. The point is to avoid joining on that aggregate, so that indexes are available for the follow-up queries.


You could improve things more dramatically yet, by removing the above-mentioned query entirely.

To do so, add three fields to matches_of_comments , namely last_comment_timestamp , last_comment_user_id , and num_comments . Maintain them using triggers, and add an index on (flag, last_comment_timestamp) . This will allow you to run the following efficient query instead:

SELECT matches_of_comments.static_id,
       matches_of_comments.num_comments,
       matches_of_comments.last_comment_timestamp,
       matches_of_comments.last_comment_user_id,
       matches_of_comments.localteam_name,
       matches_of_comments.visitorteam_name,
       matches_of_comments.localteam_goals,       
       matches_of_comments.visitorteam_goals,
       matches_of_comments.match_status
FROM matches_of_comments
WHERE matches_of_comments.flag = 1
ORDER BY matches_of_comments.last_comment_timestamp DESC
LIMIT 10 OFFSET 0

You then need to only select the needed data from users and new_iddaa — using separate queries with an in (…) clause as already discussed.


Main problem of this query is number of joins. I would suggest:

  • Take nested query f2 out and insert its output into table or temporary table, better with index. (With index it will change full table scan to index, speeding things up. Chances are high that such table will be reusable).

  • lternatively to 1 or at the same time, swap f1 and f2 . You need to eliminate rows as early as possible. So first get the list of matches and timestapms you need, and join only necessary date. This query takes huge comments table and filters out unnecessary rows. It is easier to take 10% than through away 90% for the same result.

  • Remove Distinct as it does nothing here: all joins are on equalities. If there any duplicates get data from all tables and find difference. Then add necessary JOIN condition to pick rows you need.

  • Consider these PRIMARY KEY s: comments: PRIMARY KEY (match_static_id) matches_of_comments: PRIMARY KEY (match_id) new_iddaa: PRIMARY KEY (match_id)

    Reason: Primary Key should be done on an often used column with meaning. Having it on technical AUTOINCREMENT ing column does not add value to the table.

  • You may consider adding flag as a first column to PRIMARY KEY . It will add an overhead when flag is changed, but will speed up this query as all flag=1 records will be in a single range and in a sequence on hdd. Hope this helps.


  • What if you turned this piece:

         SELECT match_static_id,
                MAX( TIMESTAMP ) maxtimestamp,
                COUNT( match_static_id ) AS comments_no
         FROM comments
         GROUP BY match_static_id
    

    into a table and you populated this with a trigger? When a new record is saved in comments, it triggers an update in a 'comments_counter' table or something like that? This should give you some performance, as you wouldn't need the Max, Counts, group by operations.

    Also, matches_of_comments.match_id = f2.match_static_id have different datatypes, the first is a varchar(255) and the second is a int(25) - having both of them be int(25) should help the overall performance.

    And last, I'd make user_id in comments be a foreign key to user.id too.

    链接地址: http://www.djcxy.com/p/75366.html

    上一篇: 如何在django的modelviewset中取消设置csrf

    下一篇: 如何根据解释结果改进此查询