查询优化(WHERE,GROUP BY,LEFT JOIN)
我正在使用InnoDB。
QUERY,EXPLAIN&INDEXES
SELECT
stories.*,
count(comments.id) AS comments,
GROUP_CONCAT(
DISTINCT classifications2.name SEPARATOR ';'
) AS classifications_name,
GROUP_CONCAT(
DISTINCT images.id
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_id,
GROUP_CONCAT(
DISTINCT images.caption
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_caption,
GROUP_CONCAT(
DISTINCT images.thumbnail
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_thumbnail,
GROUP_CONCAT(
DISTINCT images.medium
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_medium,
GROUP_CONCAT(
DISTINCT images.large
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_large,
GROUP_CONCAT(
DISTINCT users.id
ORDER BY users.id SEPARATOR ';'
) AS authors_id,
GROUP_CONCAT(
DISTINCT users.display_name
ORDER BY users.id SEPARATOR ';'
) AS authors_display_name,
GROUP_CONCAT(
DISTINCT users.url
ORDER BY users.id SEPARATOR ';'
) AS authors_url
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
LEFT JOIN image_story
ON stories.id = image_story.story_id
LEFT JOIN images
ON images.id = image_story.`image_id`
LEFT JOIN author_story
ON stories.id = author_story.story_id
LEFT JOIN users
ON users.id = author_story.author_id
WHERE classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY classifications.`name`, classifications.`position`
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | stories | ref | status | status | 1 | const | 434792 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | classifications | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 6 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_id | story_id | 4 | stories.id | 1 | NULL |
| 1 | SIMPLE | images | eq_ref | PRIMARY | PRIMARY | 4 | image_story.image_id | 1 | NULL |
| 1 | SIMPLE | author_story | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| stories | 0 | PRIMARY | 1 | id | A | 869584 | NULL | NULL | | BTREE |
| stories | 1 | created_at | 1 | created_at | A | 434792 | NULL | NULL | | BTREE |
| stories | 1 | source | 1 | source | A | 2 | NULL | NULL | YES | BTREE |
| stories | 1 | source_id | 1 | source_id | A | 869584 | NULL | NULL | YES | BTREE |
| stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | type_status | 2 | status | A | 2 | NULL | NULL | | BTREE |
| classifications | 0 | PRIMARY | 1 | id | A | 207 | NULL | NULL | | BTREE |
| classifications | 1 | story_id | 1 | story_id | A | 207 | NULL | NULL | | BTREE |
| classifications | 1 | name | 1 | name | A | 103 | NULL | NULL | | BTREE |
| classifications | 1 | name | 2 | position | A | 207 | NULL | NULL | YES | BTREE |
| comments | 0 | PRIMARY | 1 | id | A | 239336 | NULL | NULL | | BTREE |
| comments | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE |
| comments | 1 | date | 1 | date | A | 239336 | NULL | NULL | | BTREE |
| comments | 1 | story_id | 1 | story_id | A | 39889 | NULL | NULL | | BTREE |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
查询时间
平均运行需要0.035 seconds 。
如果我只删除GROUP BY ,则平均时间降至0.007 。
如果我只删除stories.status=1过滤器,则平均时间将下降到0.025 。 这似乎可以轻松优化。
如果我只删除LIKE过滤器和ORDER BY子句,则时间平均下降到0.006 。
更新1:2013-04-13
我的理解已经改进了多方面的答案。
我给author_story和images_story添加了索引,这似乎将查询提高到了0.025秒,但由于一些奇怪的原因, EXPLAIN计划看起来好多了。 此时,删除ORDER BY将查询删除为0.015秒,同时删除ORDER BY和GROUP BY查询性能提高到0.006 。 我是这两件事现在要关注的两件事? 如果需要,我可以将ORDER BY移动到应用程序逻辑中。
这里是修订后的EXPLAIN和INDEXES
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
| 1 | SIMPLE | classifications | range | story_id,name | name | 102 | NULL | 14 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | author_story | ref | author_id,story_id,author_story | story_id | 4 | stories.id | 1 | Using index condition |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_id,story_id_2 | story_id | 4 | stories.id | 1 | NULL |
| 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL |
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| author_story | 0 | PRIMARY | 1 | id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 0 | story_author | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 0 | story_author | 2 | author_id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 1 | author_id | 1 | author_id | A | 2179 | NULL | NULL | | BTREE | | |
| author_story | 1 | story_id | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | |
| image_story | 0 | PRIMARY | 1 | id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 0 | story_image | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 0 | story_image | 2 | image_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 1 | story_id | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 1 | image_id | 1 | image_id | A | 148902 | NULL | NULL | | BTREE | | |
| classifications | 0 | PRIMARY | 1 | id | A | 257 | NULL | NULL | | BTREE | | |
| classifications | 1 | story_id | 1 | story_id | A | 257 | NULL | NULL | | BTREE | | |
| classifications | 1 | name | 1 | name | A | 128 | NULL | NULL | | BTREE | | |
| classifications | 1 | name | 2 | position | A | 257 | NULL | NULL | YES | BTREE | | |
| stories | 0 | PRIMARY | 1 | id | A | 962570 | NULL | NULL | | BTREE | | |
| stories | 1 | created_at | 1 | created_at | A | 481285 | NULL | NULL | | BTREE | | |
| stories | 1 | source | 1 | source | A | 4 | NULL | NULL | YES | BTREE | | |
| stories | 1 | source_id | 1 | source_id | A | 962570 | NULL | NULL | YES | BTREE | | |
| stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE | | |
| stories | 1 | status | 1 | status | A | 4 | NULL | NULL | | BTREE | | |
| stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE | | |
| stories | 1 | type_status | 2 | status | A | 6 | NULL | NULL | | BTREE | | |
| comments | 0 | PRIMARY | 1 | id | A | 232559 | NULL | NULL | | BTREE | | |
| comments | 1 | status | 1 | status | A | 6 | NULL | NULL | | BTREE | | |
| comments | 1 | date | 1 | date | A | 232559 | NULL | NULL | | BTREE | | |
| comments | 1 | story_id | 1 | story_id | A | 29069 | NULL | NULL | | BTREE | | |
| images | 0 | PRIMARY | 1 | id | A | 147206 | NULL | NULL | | BTREE | | |
| images | 0 | source_id | 1 | source_id | A | 147206 | NULL | NULL | YES | BTREE | | |
| images | 1 | position | 1 | position | A | 4 | NULL | NULL | | BTREE | | |
| images | 1 | position_id | 1 | id | A | 147206 | NULL | NULL | | BTREE | | |
| images | 1 | position_id | 2 | position | A | 147206 | NULL | NULL | | BTREE | | |
| users | 0 | PRIMARY | 1 | id | A | 981 | NULL | NULL | | BTREE | | |
| users | 0 | users_email_unique | 1 | email | A | 981 | NULL | NULL | | BTREE | | |
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SELECT
stories.*,
count(comments.id) AS comments,
GROUP_CONCAT(DISTINCT users.id ORDER BY users.id SEPARATOR ';') AS authors_id,
GROUP_CONCAT(DISTINCT users.display_name ORDER BY users.id SEPARATOR ';') AS authors_display_name,
GROUP_CONCAT(DISTINCT users.url ORDER BY users.id SEPARATOR ';') AS authors_url,
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';') AS classifications_name,
GROUP_CONCAT(DISTINCT images.id ORDER BY images.position,images.id SEPARATOR ';') AS images_id,
GROUP_CONCAT(DISTINCT images.caption ORDER BY images.position,images.id SEPARATOR ';') AS images_caption,
GROUP_CONCAT(DISTINCT images.thumbnail ORDER BY images.position,images.id SEPARATOR ';') AS images_thumbnail,
GROUP_CONCAT(DISTINCT images.medium ORDER BY images.position,images.id SEPARATOR ';') AS images_medium,
GROUP_CONCAT(DISTINCT images.large ORDER BY images.position,images.id SEPARATOR ';') AS images_large
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
LEFT JOIN image_story
ON stories.id = image_story.story_id
LEFT JOIN images
ON images.id = image_story.`image_id`
INNER JOIN author_story
ON stories.id = author_story.story_id
INNER JOIN users
ON users.id = author_story.author_id
WHERE classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY NULL
更新2:2013-04-14
我注意到另一件事。 如果我不选择stories.content (LONGTEXT)和stories.content_html (LONGTEXT),查询将从0.015秒降至0.006秒。 现在我正在考虑如果我没有content和content_html可以做什么,或者用其他东西替换它们。
我已经更新了上述2013-04-13更新中的查询,索引并进行了解释,而不是在本文中重新发布,因为它们较小并且是增量式的。 该查询仍在使用filesort 。 我不能摆脱GROUP BY但已经摆脱了ORDER BY 。
更新3:2013-04-16
根据要求,我放弃了来自image_story和author_story的stories_id INDEXES,因为它们是多余的。 结果是explain的输出只是改变了,表明possible_keys发生了变化。 它仍然没有显示Using Index优化不幸的。
LONGTEXT更改为TEXT ,现在正在提取LEFT(stories.content, 500)而不是stories.content ,这会在查询执行时间中产生非常显着的差异。
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+ | 1 | SIMPLE | classifications | ref | story_id,name,name_position | name | 102 | const | 10 | Using index condition; Using where; Using temporary; Using filesort | | 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where | | 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where | | 1 | SIMPLE | author_story | ref | story_author | story_author | 4 | stories.id | 1 | Using where; Using index | | 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where | | 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index | | 1 | SIMPLE | image_story | ref | story_image | story_image | 4 | stories.id | 1 | Using index | | 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL | +----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+ innodb_buffer_pool_size 134217728 TABLE_NAME INDEX_LENGTH image_story 10010624 image_story 4556800 image_story 0 TABLE_NAME INDEX_NAMES SIZE dawn/image_story story_image 13921
我可以看到两个优化的机会:
将OUTER JOIN更改为INNER JOIN
您的查询目前正在扫描434792个故事,并且您应该能够更好地缩小这个故事,假设并非每个故事都有与“首页:顶部%”匹配的分类。 最好使用索引来查找要查找的分类,然后查找匹配的故事。
但是,您正在使用LEFT OUTER JOIN进行分类,这意味着所有故事都会被扫描,无论它们是否具有匹配的分类。 然后你通过在WHERE子句中加入分类条件来打败这一点,这有效地强制要求有一个与LIKE匹配的分类。 所以它不再是外连接,而是内连接。
如果您先将分类表放入内部联接中,那么优化程序将使用它来缩小对具有匹配分类的搜索范围的限制。
. . .
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
. . .
优化器应该能够找出何时重新排序表是有利的,因此您可能不必更改查询中的顺序。 但在这种情况下,您确实需要使用INNER JOIN 。
添加复合索引
您的交集表image_story和author_story没有复合索引。 将复合索引以多对多的关系添加到交集表通常是一个很大的优势,以便他们可以执行连接并获得“使用索引”优化。
ALTER TABLE image_story ADD UNIQUE KEY imst_st_im (story_id, image_id);
ALTER TABLE author_story ADD UNIQUE KEY aust_st_au (story_id, author_id);
重新评论和更新:
我不确定你是否正确创建了新的索引。 您的索引转储不显示列,并根据更新的EXPLAIN,新索引没有被使用,我期望发生。 使用新索引会在EXPLAIN的额外字段中产生“使用索引”,这应该有助于提高性能。
对于每个表, SHOW CREATE TABLE输出将是比索引转储(没有列名称)更完整的信息,如您所示。
创建索引后,您可能必须在每个表上运行ANALYZE TABLE一次。 另外,多次运行查询,以确保索引位于缓冲池中。 这张表是InnoDB还是MyISAM?
我还注意到,在您的EXPLAIN输出中, rows列显示的rows数少得多。 这是一个改进。
你真的需要ORDER BY吗? 如果你使用ORDER BY NULL你应该能够摆脱“使用filesort”并且可以提高性能。
重新更新:
您仍然没有从image_story和author_story表中获得“使用索引”优化。 我的一个建议是消除冗余索引:
ALTER TABLE image_story DROP KEY story_id;
ALTER TABLE author_story DROP KEY story_id;
原因是任何可以从story_id上的单列索引中受益的查询也可以从(story_id,image_id)上的两列索引中受益。 消除冗余索引有助于优化器做出更好的决策(以及节省一些空间)。 这是pt-duplicate-key-checker之类的工具背后的理论。
我还会检查以确保您的缓冲池足够大以容纳您的索引。 您不希望索引在查询期间分页进出缓冲池。
SHOW VARIABLES LIKE 'innodb_buffer_pool_size'
检查image_story表的索引大小:
SELECT TABLE_NAME, INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'image_story';
并将其与当前有多少这些索引驻留在缓冲池中进行比较:
SELECT TABLE_NAME, GROUP_CONCAT(DISTINCT INDEX_NAME) AS INDEX_NAMES, SUM(DATA_SIZE) AS SIZE
FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE_LRU
WHERE TABLE_NAME = '`test`.`image_story`' AND INDEX_NAME <> 'PRIMARY'
当然,将上面的`test`更改为您的表所属的数据库名称。
那个information_schema表在MySQL 5.6中是新的。 我假设你使用的是MySQL 5.6,因为你的EXPLAIN显示了“使用索引条件”,这在MySQL 5.6中也是新的。
我根本不使用LONGTEXT,除非我真的需要描述非常长的字符串。 记住:
当你使用MYSQL时,你可以利用Straight_join
STRAIGHT_JOIN forces the optimizer to join the tables in the order in which they are listed in the FROM clause. You can use this to speed up a query if the optimizer joins the tables in nonoptimal order
另外一个改进的范围是过滤表格stories的数据,因为您只需要状态为1的数据
因此,在表单子句中,而不是添加整个stories表时,只需添加所需的记录,因为查询计划显示classification表中有434792行和相同的记录
FROM
(SELECT
*
FROM
STORIES
WHERE
STORIES.status = 1) stories
LEFT JOIN
(SELECT
*
FROM
classifications
WHERE
classifications.`name` LIKE 'Home:Top%') classifications
ON stories.id = classifications.story_id
还有一个建议可以增加sort_buffer_size就像您按order by和按group by显示的一样,但要小心增加缓冲区大小,因为每个会话的缓冲区大小都会增加。
此外,如果可能的话,您可以在您的应用程序中订购您的记录(如果可能的话),因为您已经提到删除order by子句可以提高原始时间的1/6部分。
编辑
添加索引image_story.image_id为image_story表和author_story.story_id为author_story表,因为这些列用于连接
在您使用它时images.position, images.id也必须创建images.position, images.id上的索引。
编辑16/4
我认为你几乎已经优化你的查询,看到你更新...
还有一个你可以改进的地方就是使用适当的数据类型,比如BillKarwin提到的......你可以使用ENUM或TINYINT类型来处理状态等其他不具有任何增长范围的列,它可以帮助你优化查询性能以及您的桌子的存储性能....
希望这可以帮助....
计算
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';')
可能是最耗时的操作,因为classifications是一个很大的表格,并且由于所有联接而使用的行数相乘。
所以我会建议使用一个临时表来获取这些信息。 此外,为了避免两次计算LIKE条件(一次用于临时表,一次用于“真实”查询),我还会为此创建一个临时表。
您的原始查询是一个非常简化的版本(没有图像和用户表,以便读取更容易)是:
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';' )
AS classifications_name
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
我会用下面的查询代替它,使用临时表_tmp_filtered_classifications (分类与名称像家一样的ID:顶级%')和_tmp_classifications_of_story (每个故事ID‘中包含的’ _tmp_filtered_classifications ,所有的分类名称):
DROP TABLE IF EXISTS `_tmp_filtered_classifications`;
CREATE TEMPORARY TABLE _tmp_filtered_classifications
SELECT id FROM classifications WHERE name LIKE 'Home:Top%';
DROP TABLE IF EXISTS `_tmp_classifications_of_story`;
CREATE TEMPORARY TABLE _tmp_classifications_of_story ENGINE=MEMORY
SELECT stories.id AS story_id, classifications2.name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
GROUP BY 1,2;
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';')
AS classifications_name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN _tmp_classifications_of_story AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
请注意,为了检查两个查询是否提供相同的结果(使用diff),我在查询中添加了一些“order by”子句。 我还将count(comments.id)更改为count(DISTINCT comments.id)否则查询计算出的注释数量错误(同样,由于连接数增加了行数)。
上一篇: Query Optimization (WHERE, GROUP BY, LEFT JOINs)
下一篇: Optimize SQL Query
