My sub query is adding 20 seconds to the execution time. How can I speed it up?
I have a table of sent SMS text messages which must join to a delivery receipt table to get the latest status of a message.
There are 997,148 sent text messages.
I am running this query:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
LEFT JOIN
delivery_receipts d
ON
d.message_id = m.id
AND
d.id = (SELECT MAX(id) FROM delivery_receipts WHERE message_id = m.id)
Which returns 997,148 results including the latest status of each message.
This takes 22.8688 seconds to execute.
Here is the SQL for messages_sent
:
CREATE TABLE IF NOT EXISTS `messages_sent` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL,
`api_key` varchar(40) NOT NULL,
`to` varchar(15) NOT NULL,
`message` text NOT NULL,
`type` enum('sms','mms') NOT NULL DEFAULT 'sms',
`sender_id` varchar(15) NOT NULL,
`route` tinyint(1) unsigned NOT NULL,
`supplier` tinyint(1) unsigned NOT NULL,
`submission_reference` varchar(40) NOT NULL,
`unique_submission_reference` varchar(40) NOT NULL,
`reason_code` tinyint(1) unsigned NOT NULL,
`reason` text NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `api_key` (`api_key`),
KEY `sender_id` (`sender_id`),
KEY `route` (`route`),
KEY `submission_reference` (`submission_reference`),
KEY `reason_code` (`reason_code`),
KEY `timestamp` (`timestamp`),
KEY `to` (`to`),
KEY `unique_submission_reference` (`unique_submission_reference`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1000342 ;
And for delivery_receipts
:
CREATE TABLE IF NOT EXISTS `delivery_receipts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`message_id` int(10) unsigned NOT NULL,
`dlr_id` bigint(20) unsigned NOT NULL,
`dlr_status` tinyint(2) unsigned NOT NULL,
`dlr_substatus` tinyint(2) unsigned NOT NULL,
`dlr_final` tinyint(1) unsigned NOT NULL,
`dlr_refid` varchar(40) NOT NULL,
`dlr_phone` varchar(12) NOT NULL,
`dlr_charge` tinyint(3) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `message_id` (`message_id`),
KEY `dlr_status` (`dlr_status`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1468592 ;
Here is an EXPLAIN
of the SQL:
There is a trick.
Instead with picking MAX element with subquery you join with interesting table twice like this:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
JOIN
delivery_receipts d
ON
d.message_id = m.id
LEFT JOIN
delivery_receipts d1
ON
d1.message_id = m.id
AND
d1.id > d.id
WHERE
d1.id IS NULL
The second time table is joined it has additional condition that field that you want to pick MAX of should be higher than in the first table. And filter out all rows except the ones that do not have other row that's higher.
This way only max rows remain.
I changed your LEFT JOIN to JOIN. I'm not sure if you need LEFT JOIN there. Even if you it should still work.
Amazingly this is much faster than subquery.
You might want to try out other variant of the same idea:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
JOIN
(
SELECT d0.* FROM
delivery_receipts d0
LEFT JOIN
delivery_receipts d1
ON
d1.message_id = d0.message_id
AND
d1.id > d0.id
WHERE
d1.id IS NULL
) d
ON
d.message_id = m.id
Make sure you have multicolumn index for fields message_id and id in table delivery_receipts maybe such:
ALTER TABLE `delivery_receipts`
ADD INDEX `idx` ( `message_id` , `id` );
The slowdown seems large, but I'm afraid there is not much room for improvement if you need to stick with this query.
One problem is the reporting of d.dlr_status
. Try to remove this from the list of reported columns and see if the query time improves.
You would get the best possible performance if everything was stored in messages_sent
. This won't be NF anymore, but it's an option if you need performance. To achieve this, create id
and dlr_status
columns in messages_sent
and add appropriate INSERT
, UPDATE
and DELETE
triggers to delivery_receipts
. The triggers would update the corresponding columns in messages_sent
-- it's a trade-off between query time and update time.
You can "cache" part of the computation in the delivery_receipts table, just add is_last_status boolean to the delivery_receipts table. Using simple triggers you can change the value every insert of new receipt.
Than the select query becomes much simpler:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
LEFT JOIN
delivery_receipts d
ON
d.message_id = m.id
WHERE
d.is_last_status = true
If mysql would support partial indexes the query could be speed up even more.
链接地址: http://www.djcxy.com/p/55052.html上一篇: 请求对象,有什么优点和缺点?