查询优化(WHERE,GROUP BY,LEFT JOIN)
我正在使用InnoDB。
QUERY,EXPLAIN&INDEXES
SELECT stories.*, count(comments.id) AS comments, GROUP_CONCAT( DISTINCT classifications2.name SEPARATOR ';' ) AS classifications_name, GROUP_CONCAT( DISTINCT images.id ORDER BY images.position, images.id SEPARATOR ';' ) AS images_id, GROUP_CONCAT( DISTINCT images.caption ORDER BY images.position, images.id SEPARATOR ';' ) AS images_caption, GROUP_CONCAT( DISTINCT images.thumbnail ORDER BY images.position, images.id SEPARATOR ';' ) AS images_thumbnail, GROUP_CONCAT( DISTINCT images.medium ORDER BY images.position, images.id SEPARATOR ';' ) AS images_medium, GROUP_CONCAT( DISTINCT images.large ORDER BY images.position, images.id SEPARATOR ';' ) AS images_large, GROUP_CONCAT( DISTINCT users.id ORDER BY users.id SEPARATOR ';' ) AS authors_id, GROUP_CONCAT( DISTINCT users.display_name ORDER BY users.id SEPARATOR ';' ) AS authors_display_name, GROUP_CONCAT( DISTINCT users.url ORDER BY users.id SEPARATOR ';' ) AS authors_url FROM stories LEFT JOIN classifications ON stories.id = classifications.story_id LEFT JOIN classifications AS classifications2 ON stories.id = classifications2.story_id LEFT JOIN comments ON stories.id = comments.story_id LEFT JOIN image_story ON stories.id = image_story.story_id LEFT JOIN images ON images.id = image_story.`image_id` LEFT JOIN author_story ON stories.id = author_story.story_id LEFT JOIN users ON users.id = author_story.author_id WHERE classifications.`name` LIKE 'Home:Top%' AND stories.status = 1 GROUP BY stories.id ORDER BY classifications.`name`, classifications.`position` +----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+ | 1 | SIMPLE | stories | ref | status | status | 1 | const | 434792 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | classifications | ref | story_id | story_id | 4 | stories.id | 1 | Using where | | 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where | | 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 6 | Using where; Using index | | 1 | SIMPLE | image_story | ref | story_id | story_id | 4 | stories.id | 1 | NULL | | 1 | SIMPLE | images | eq_ref | PRIMARY | PRIMARY | 4 | image_story.image_id | 1 | NULL | | 1 | SIMPLE | author_story | ref | story_id | story_id | 4 | stories.id | 1 | Using where | | 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where | +----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+ +-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | +-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+ | stories | 0 | PRIMARY | 1 | id | A | 869584 | NULL | NULL | | BTREE | | stories | 1 | created_at | 1 | created_at | A | 434792 | NULL | NULL | | BTREE | | stories | 1 | source | 1 | source | A | 2 | NULL | NULL | YES | BTREE | | stories | 1 | source_id | 1 | source_id | A | 869584 | NULL | NULL | YES | BTREE | | stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE | | stories | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE | | stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE | | stories | 1 | type_status | 2 | status | A | 2 | NULL | NULL | | BTREE | | classifications | 0 | PRIMARY | 1 | id | A | 207 | NULL | NULL | | BTREE | | classifications | 1 | story_id | 1 | story_id | A | 207 | NULL | NULL | | BTREE | | classifications | 1 | name | 1 | name | A | 103 | NULL | NULL | | BTREE | | classifications | 1 | name | 2 | position | A | 207 | NULL | NULL | YES | BTREE | | comments | 0 | PRIMARY | 1 | id | A | 239336 | NULL | NULL | | BTREE | | comments | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE | | comments | 1 | date | 1 | date | A | 239336 | NULL | NULL | | BTREE | | comments | 1 | story_id | 1 | story_id | A | 39889 | NULL | NULL | | BTREE | +-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
查询时间
平均运行需要0.035 seconds
。
如果我只删除GROUP BY
,则平均时间降至0.007
。
如果我只删除stories.status=1
过滤器,则平均时间将下降到0.025
。 这似乎可以轻松优化。
如果我只删除LIKE
过滤器和ORDER BY
子句,则时间平均下降到0.006
。
更新1:2013-04-13
我的理解已经改进了多方面的答案。
我给author_story
和images_story
添加了索引,这似乎将查询提高到了0.025
秒,但由于一些奇怪的原因, EXPLAIN
计划看起来好多了。 此时,删除ORDER BY
将查询删除为0.015
秒,同时删除ORDER BY
和GROUP BY
查询性能提高到0.006
。 我是这两件事现在要关注的两件事? 如果需要,我可以将ORDER BY
移动到应用程序逻辑中。
这里是修订后的EXPLAIN
和INDEXES
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+ | 1 | SIMPLE | classifications | range | story_id,name | name | 102 | NULL | 14 | Using index condition; Using temporary; Using filesort | | 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where | | 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where | | 1 | SIMPLE | author_story | ref | author_id,story_id,author_story | story_id | 4 | stories.id | 1 | Using index condition | | 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where | | 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index | | 1 | SIMPLE | image_story | ref | story_id,story_id_2 | story_id | 4 | stories.id | 1 | NULL | | 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL | +----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+ +-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | author_story | 0 | PRIMARY | 1 | id | A | 220116 | NULL | NULL | | BTREE | | | | author_story | 0 | story_author | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | | | author_story | 0 | story_author | 2 | author_id | A | 220116 | NULL | NULL | | BTREE | | | | author_story | 1 | author_id | 1 | author_id | A | 2179 | NULL | NULL | | BTREE | | | | author_story | 1 | story_id | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | | | image_story | 0 | PRIMARY | 1 | id | A | 148902 | NULL | NULL | | BTREE | | | | image_story | 0 | story_image | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | | | image_story | 0 | story_image | 2 | image_id | A | 148902 | NULL | NULL | | BTREE | | | | image_story | 1 | story_id | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | | | image_story | 1 | image_id | 1 | image_id | A | 148902 | NULL | NULL | | BTREE | | | | classifications | 0 | PRIMARY | 1 | id | A | 257 | NULL | NULL | | BTREE | | | | classifications | 1 | story_id | 1 | story_id | A | 257 | NULL | NULL | | BTREE | | | | classifications | 1 | name | 1 | name | A | 128 | NULL | NULL | | BTREE | | | | classifications | 1 | name | 2 | position | A | 257 | NULL | NULL | YES | BTREE | | | | stories | 0 | PRIMARY | 1 | id | A | 962570 | NULL | NULL | | BTREE | | | | stories | 1 | created_at | 1 | created_at | A | 481285 | NULL | NULL | | BTREE | | | | stories | 1 | source | 1 | source | A | 4 | NULL | NULL | YES | BTREE | | | | stories | 1 | source_id | 1 | source_id | A | 962570 | NULL | NULL | YES | BTREE | | | | stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE | | | | stories | 1 | status | 1 | status | A | 4 | NULL | NULL | | BTREE | | | | stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE | | | | stories | 1 | type_status | 2 | status | A | 6 | NULL | NULL | | BTREE | | | | comments | 0 | PRIMARY | 1 | id | A | 232559 | NULL | NULL | | BTREE | | | | comments | 1 | status | 1 | status | A | 6 | NULL | NULL | | BTREE | | | | comments | 1 | date | 1 | date | A | 232559 | NULL | NULL | | BTREE | | | | comments | 1 | story_id | 1 | story_id | A | 29069 | NULL | NULL | | BTREE | | | | images | 0 | PRIMARY | 1 | id | A | 147206 | NULL | NULL | | BTREE | | | | images | 0 | source_id | 1 | source_id | A | 147206 | NULL | NULL | YES | BTREE | | | | images | 1 | position | 1 | position | A | 4 | NULL | NULL | | BTREE | | | | images | 1 | position_id | 1 | id | A | 147206 | NULL | NULL | | BTREE | | | | images | 1 | position_id | 2 | position | A | 147206 | NULL | NULL | | BTREE | | | | users | 0 | PRIMARY | 1 | id | A | 981 | NULL | NULL | | BTREE | | | | users | 0 | users_email_unique | 1 | email | A | 981 | NULL | NULL | | BTREE | | | +-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ SELECT stories.*, count(comments.id) AS comments, GROUP_CONCAT(DISTINCT users.id ORDER BY users.id SEPARATOR ';') AS authors_id, GROUP_CONCAT(DISTINCT users.display_name ORDER BY users.id SEPARATOR ';') AS authors_display_name, GROUP_CONCAT(DISTINCT users.url ORDER BY users.id SEPARATOR ';') AS authors_url, GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';') AS classifications_name, GROUP_CONCAT(DISTINCT images.id ORDER BY images.position,images.id SEPARATOR ';') AS images_id, GROUP_CONCAT(DISTINCT images.caption ORDER BY images.position,images.id SEPARATOR ';') AS images_caption, GROUP_CONCAT(DISTINCT images.thumbnail ORDER BY images.position,images.id SEPARATOR ';') AS images_thumbnail, GROUP_CONCAT(DISTINCT images.medium ORDER BY images.position,images.id SEPARATOR ';') AS images_medium, GROUP_CONCAT(DISTINCT images.large ORDER BY images.position,images.id SEPARATOR ';') AS images_large FROM classifications INNER JOIN stories ON stories.id = classifications.story_id LEFT JOIN classifications AS classifications2 ON stories.id = classifications2.story_id LEFT JOIN comments ON stories.id = comments.story_id LEFT JOIN image_story ON stories.id = image_story.story_id LEFT JOIN images ON images.id = image_story.`image_id` INNER JOIN author_story ON stories.id = author_story.story_id INNER JOIN users ON users.id = author_story.author_id WHERE classifications.`name` LIKE 'Home:Top%' AND stories.status = 1 GROUP BY stories.id ORDER BY NULL
更新2:2013-04-14
我注意到另一件事。 如果我不选择stories.content
(LONGTEXT)和stories.content_html
(LONGTEXT),查询将从0.015
秒降至0.006
秒。 现在我正在考虑如果我没有content
和content_html
可以做什么,或者用其他东西替换它们。
我已经更新了上述2013-04-13更新中的查询,索引并进行了解释,而不是在本文中重新发布,因为它们较小并且是增量式的。 该查询仍在使用filesort
。 我不能摆脱GROUP BY
但已经摆脱了ORDER BY
。
更新3:2013-04-16
根据要求,我放弃了来自image_story
和author_story
的stories_id INDEXES,因为它们是多余的。 结果是explain的输出只是改变了,表明possible_keys
发生了变化。 它仍然没有显示Using Index
优化不幸的。
LONGTEXT
更改为TEXT
,现在正在提取LEFT(stories.content, 500)
而不是stories.content
,这会在查询执行时间中产生非常显着的差异。
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+ | 1 | SIMPLE | classifications | ref | story_id,name,name_position | name | 102 | const | 10 | Using index condition; Using where; Using temporary; Using filesort | | 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where | | 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where | | 1 | SIMPLE | author_story | ref | story_author | story_author | 4 | stories.id | 1 | Using where; Using index | | 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where | | 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index | | 1 | SIMPLE | image_story | ref | story_image | story_image | 4 | stories.id | 1 | Using index | | 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL | +----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+ innodb_buffer_pool_size 134217728 TABLE_NAME INDEX_LENGTH image_story 10010624 image_story 4556800 image_story 0 TABLE_NAME INDEX_NAMES SIZE dawn/image_story story_image 13921
我可以看到两个优化的机会:
将OUTER JOIN更改为INNER JOIN
您的查询目前正在扫描434792个故事,并且您应该能够更好地缩小这个故事,假设并非每个故事都有与“首页:顶部%”匹配的分类。 最好使用索引来查找要查找的分类,然后查找匹配的故事。
但是,您正在使用LEFT OUTER JOIN
进行分类,这意味着所有故事都会被扫描,无论它们是否具有匹配的分类。 然后你通过在WHERE
子句中加入分类条件来打败这一点,这有效地强制要求有一个与LIKE
匹配的分类。 所以它不再是外连接,而是内连接。
如果您先将分类表放入内部联接中,那么优化程序将使用它来缩小对具有匹配分类的搜索范围的限制。
. . .
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
. . .
优化器应该能够找出何时重新排序表是有利的,因此您可能不必更改查询中的顺序。 但在这种情况下,您确实需要使用INNER JOIN
。
添加复合索引
您的交集表image_story和author_story没有复合索引。 将复合索引以多对多的关系添加到交集表通常是一个很大的优势,以便他们可以执行连接并获得“使用索引”优化。
ALTER TABLE image_story ADD UNIQUE KEY imst_st_im (story_id, image_id);
ALTER TABLE author_story ADD UNIQUE KEY aust_st_au (story_id, author_id);
重新评论和更新:
我不确定你是否正确创建了新的索引。 您的索引转储不显示列,并根据更新的EXPLAIN,新索引没有被使用,我期望发生。 使用新索引会在EXPLAIN的额外字段中产生“使用索引”,这应该有助于提高性能。
对于每个表, SHOW CREATE TABLE
输出将是比索引转储(没有列名称)更完整的信息,如您所示。
创建索引后,您可能必须在每个表上运行ANALYZE TABLE一次。 另外,多次运行查询,以确保索引位于缓冲池中。 这张表是InnoDB还是MyISAM?
我还注意到,在您的EXPLAIN输出中, rows
列显示的rows
数少得多。 这是一个改进。
你真的需要ORDER BY
吗? 如果你使用ORDER BY NULL
你应该能够摆脱“使用filesort”并且可以提高性能。
重新更新:
您仍然没有从image_story
和author_story
表中获得“使用索引”优化。 我的一个建议是消除冗余索引:
ALTER TABLE image_story DROP KEY story_id;
ALTER TABLE author_story DROP KEY story_id;
原因是任何可以从story_id上的单列索引中受益的查询也可以从(story_id,image_id)上的两列索引中受益。 消除冗余索引有助于优化器做出更好的决策(以及节省一些空间)。 这是pt-duplicate-key-checker之类的工具背后的理论。
我还会检查以确保您的缓冲池足够大以容纳您的索引。 您不希望索引在查询期间分页进出缓冲池。
SHOW VARIABLES LIKE 'innodb_buffer_pool_size'
检查image_story表的索引大小:
SELECT TABLE_NAME, INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'image_story';
并将其与当前有多少这些索引驻留在缓冲池中进行比较:
SELECT TABLE_NAME, GROUP_CONCAT(DISTINCT INDEX_NAME) AS INDEX_NAMES, SUM(DATA_SIZE) AS SIZE
FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE_LRU
WHERE TABLE_NAME = '`test`.`image_story`' AND INDEX_NAME <> 'PRIMARY'
当然,将上面的`test`更改为您的表所属的数据库名称。
那个information_schema表在MySQL 5.6中是新的。 我假设你使用的是MySQL 5.6,因为你的EXPLAIN显示了“使用索引条件”,这在MySQL 5.6中也是新的。
我根本不使用LONGTEXT,除非我真的需要描述非常长的字符串。 记住:
当你使用MYSQL时,你可以利用Straight_join
STRAIGHT_JOIN forces the optimizer to join the tables in the order in which they are listed in the FROM clause. You can use this to speed up a query if the optimizer joins the tables in nonoptimal order
另外一个改进的范围是过滤表格stories
的数据,因为您只需要状态为1的数据
因此,在表单子句中,而不是添加整个stories
表时,只需添加所需的记录,因为查询计划显示classification
表中有434792
行和相同的记录
FROM
(SELECT
*
FROM
STORIES
WHERE
STORIES.status = 1) stories
LEFT JOIN
(SELECT
*
FROM
classifications
WHERE
classifications.`name` LIKE 'Home:Top%') classifications
ON stories.id = classifications.story_id
还有一个建议可以增加sort_buffer_size
就像您按order by
和按group by
显示的一样,但要小心增加缓冲区大小,因为每个会话的缓冲区大小都会增加。
此外,如果可能的话,您可以在您的应用程序中订购您的记录(如果可能的话),因为您已经提到删除order by
子句可以提高原始时间的1/6
部分。
编辑
添加索引image_story.image_id
为image_story
表和author_story.story_id
为author_story
表,因为这些列用于连接
在您使用它时images.position, images.id
也必须创建images.position, images.id
上的索引。
编辑16/4
我认为你几乎已经优化你的查询,看到你更新...
还有一个你可以改进的地方就是使用适当的数据类型,比如BillKarwin提到的......你可以使用ENUM
或TINYINT
类型来处理状态等其他不具有任何增长范围的列,它可以帮助你优化查询性能以及您的桌子的存储性能....
希望这可以帮助....
计算
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';')
可能是最耗时的操作,因为classifications
是一个很大的表格,并且由于所有联接而使用的行数相乘。
所以我会建议使用一个临时表来获取这些信息。 此外,为了避免两次计算LIKE条件(一次用于临时表,一次用于“真实”查询),我还会为此创建一个临时表。
您的原始查询是一个非常简化的版本(没有图像和用户表,以便读取更容易)是:
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';' )
AS classifications_name
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
我会用下面的查询代替它,使用临时表_tmp_filtered_classifications
(分类与名称像家一样的ID:顶级%')和_tmp_classifications_of_story
(每个故事ID‘中包含的’ _tmp_filtered_classifications
,所有的分类名称):
DROP TABLE IF EXISTS `_tmp_filtered_classifications`;
CREATE TEMPORARY TABLE _tmp_filtered_classifications
SELECT id FROM classifications WHERE name LIKE 'Home:Top%';
DROP TABLE IF EXISTS `_tmp_classifications_of_story`;
CREATE TEMPORARY TABLE _tmp_classifications_of_story ENGINE=MEMORY
SELECT stories.id AS story_id, classifications2.name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
GROUP BY 1,2;
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';')
AS classifications_name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN _tmp_classifications_of_story AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
请注意,为了检查两个查询是否提供相同的结果(使用diff),我在查询中添加了一些“order by”子句。 我还将count(comments.id)
更改为count(DISTINCT comments.id)
否则查询计算出的注释数量错误(同样,由于连接数增加了行数)。
上一篇: Query Optimization (WHERE, GROUP BY, LEFT JOINs)
下一篇: Optimize SQL Query