Top 20 Group Ranking Query
I am creating a reporting structure where I need to output the top 20 days of aggregate stats for each unique Company - Region. I have completed this task but feel that my code is overly complicated and I am requesting help optimizing it.
I have 2 tables involved in this process. The first lists all the possible Company - Region - Group - Subgroups. The second has hourly stats by the Group - Subgroup.
SQL Fiddle link: http://sqlfiddle.com/#!9/29a7b/1
NOTE : currently getting a SELECT command denied to user '<user>'@'<ip>' for table 'table_stats'
error on my SQL Fiddle, would appreciate help resolving this as well.
table_companies declaration and dummy data:
CREATE TABLE `table_companies` (
`pk_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`company` varchar(45) NOT NULL,
`region` varchar(45) NOT NULL,
`group` varchar(45) NOT NULL,
`subgroup` varchar(45) NOT NULL,
PRIMARY KEY (`pk_id`),
UNIQUE KEY `pk_id_id_UNIQUE` (`pk_id`)
);
INSERT INTO table_companies
(`pk_id`, `company`, `region`, `group`, `subgroup`)
VALUES
(1, 'company1', 'region1', 'group1', 'subgroup1'),
(2, 'company1', 'region1', 'group1', 'subgroup2'),
(3, 'company1', 'region2', 'group2', 'subgroup3'),
(4, 'company1', 'region3', 'group3', 'subgroup4'),
(5, 'company2', 'region1', 'group4', 'subgroup5'),
(6, 'company2', 'region3', 'group5', 'subgroup6'),
(7, 'company2', 'region3', 'group6', 'subgroup7'),
(8, 'company2', 'region4', 'group7', 'subgroup8'),
(9, 'company2', 'region5', 'group8', 'subgroup9'),
(10, 'company3', 'region6', 'group9', 'subgroup10'),
(11, 'company3', 'region7', 'group10', 'subgroup11'),
(12, 'company3', 'region8', 'group11', 'subgroup12'),
(13, 'company4', 'region9', 'group12', 'subgroup13'),
(14, 'company4', 'region10', 'group13', 'subgroup14'),
(15, 'company5', 'region11', 'group14', 'subgroup15'),
(16, 'company5', 'region12', 'group15', 'subgroup16')
;
table_stats declaration:
Simplified to only contain a couple of the hours per day for only 1 group - subgroup.
CREATE TABLE `table_stats` (
`pk_id` int(10) unsigned NOT NULL,
`date_time` datetime NOT NULL,
`group` varchar(45) NOT NULL,
`subgroup` varchar(45) NOT NULL,
`stat` int(10) unsigned NOT NULL,
PRIMARY KEY (`pk_id`),
UNIQUE KEY `pk_id_UNIQUE` (`pk_id`),
UNIQUE KEY `om_unique` (`date_time`,`group`,`subgroup`)
);
INSERT INTO table_stats
(`pk_id`, `date_time`, `group`, `subgroup`, `stat`)
VALUES
(1, '2015-12-01 06:00:00', 'group9', 'subgroup10', 14),
(2, '2015-12-01 12:00:00', 'group9', 'subgroup10', 14),
(3, '2015-12-02 06:00:00', 'group9', 'subgroup10', 2),
(4, '2015-12-02 12:00:00', 'group9', 'subgroup10', 51),
(5, '2015-12-03 06:00:00', 'group9', 'subgroup10', 30),
(6, '2015-12-03 12:00:00', 'group9', 'subgroup10', 6),
(7, '2015-12-04 06:00:00', 'group9', 'subgroup10', 9),
(8, '2015-12-04 12:00:00', 'group9', 'subgroup10', 77),
(9, '2015-12-05 06:00:00', 'group9', 'subgroup10', 70),
(10, '2015-12-05 12:00:00', 'group9', 'subgroup10', 7),
(11, '2015-12-06 06:00:00', 'group9', 'subgroup10', 38),
(12, '2015-12-06 12:00:00', 'group9', 'subgroup10', 5),
(13, '2015-12-07 06:00:00', 'group9', 'subgroup10', 86),
(14, '2015-12-07 12:00:00', 'group9', 'subgroup10', 73),
(15, '2015-12-08 06:00:00', 'group9', 'subgroup10', 45),
(16, '2015-12-08 12:00:00', 'group9', 'subgroup10', 14),
(17, '2015-12-09 06:00:00', 'group9', 'subgroup10', 66),
(18, '2015-12-09 12:00:00', 'group9', 'subgroup10', 38),
(19, '2015-12-10 06:00:00', 'group9', 'subgroup10', 12),
(20, '2015-12-10 12:00:00', 'group9', 'subgroup10', 77),
(21, '2015-12-11 06:00:00', 'group9', 'subgroup10', 21),
(22, '2015-12-11 12:00:00', 'group9', 'subgroup10', 18),
(23, '2015-12-12 06:00:00', 'group9', 'subgroup10', 28),
(24, '2015-12-12 12:00:00', 'group9', 'subgroup10', 74),
(25, '2015-12-13 06:00:00', 'group9', 'subgroup10', 20),
(26, '2015-12-13 12:00:00', 'group9', 'subgroup10', 37),
(27, '2015-12-14 06:00:00', 'group9', 'subgroup10', 66),
(28, '2015-12-14 12:00:00', 'group9', 'subgroup10', 59),
(29, '2015-12-15 06:00:00', 'group9', 'subgroup10', 26),
(30, '2015-12-15 12:00:00', 'group9', 'subgroup10', 0),
(31, '2015-12-16 06:00:00', 'group9', 'subgroup10', 77),
(32, '2015-12-16 12:00:00', 'group9', 'subgroup10', 31),
(33, '2015-12-17 06:00:00', 'group9', 'subgroup10', 59),
(34, '2015-12-17 12:00:00', 'group9', 'subgroup10', 71),
(35, '2015-12-18 06:00:00', 'group9', 'subgroup10', 7),
(36, '2015-12-18 12:00:00', 'group9', 'subgroup10', 73),
(37, '2015-12-19 06:00:00', 'group9', 'subgroup10', 72),
(38, '2015-12-19 12:00:00', 'group9', 'subgroup10', 28),
(39, '2015-12-20 06:00:00', 'group9', 'subgroup10', 50),
(40, '2015-12-20 12:00:00', 'group9', 'subgroup10', 11),
(41, '2015-12-21 06:00:00', 'group9', 'subgroup10', 71),
(42, '2015-12-21 12:00:00', 'group9', 'subgroup10', 4),
(43, '2015-12-22 06:00:00', 'group9', 'subgroup10', 78),
(44, '2015-12-22 12:00:00', 'group9', 'subgroup10', 69),
(45, '2015-12-23 06:00:00', 'group9', 'subgroup10', 83),
(46, '2015-12-23 12:00:00', 'group9', 'subgroup10', 55),
(47, '2015-12-24 06:00:00', 'group9', 'subgroup10', 71),
(48, '2015-12-24 12:00:00', 'group9', 'subgroup10', 20),
(49, '2015-12-25 06:00:00', 'group9', 'subgroup10', 90),
(50, '2015-12-25 12:00:00', 'group9', 'subgroup10', 26),
(51, '2015-12-26 06:00:00', 'group9', 'subgroup10', 1),
(52, '2015-12-26 12:00:00', 'group9', 'subgroup10', 73),
(53, '2015-12-27 06:00:00', 'group9', 'subgroup10', 4),
(54, '2015-12-27 12:00:00', 'group9', 'subgroup10', 18),
(55, '2015-12-28 06:00:00', 'group9', 'subgroup10', 4),
(56, '2015-12-28 12:00:00', 'group9', 'subgroup10', 30),
(57, '2015-12-29 06:00:00', 'group9', 'subgroup10', 56),
(58, '2015-12-29 12:00:00', 'group9', 'subgroup10', 53),
(59, '2015-12-30 06:00:00', 'group9', 'subgroup10', 33),
(60, '2015-12-31 12:00:00', 'group9', 'subgroup10', 8)
;
Query to optimize:
SELECT * FROM
(
SELECT t3.company,t3.region,t3.day, t3.day_stat,COUNT(*) as rank
FROM
(
SELECT t2.company,t2.region,DAY(t1.date_time) as day,SUM(t1.stat) as day_stat
FROM schema1.table_stats t1
INNER JOIN table_companies t2
ON t1.group=t2.group AND t1.subgroup=t2.subgroup
WHERE
MONTH(t1.date_time)=12 AND
YEAR(t1.date_time)=2015
group by t2.company,t2.region,DAY(t1.date_time)
ORDER BY t2.company,t2.region,day_stat DESC
) t3
JOIN
(
SELECT t2.company,t2.region,DAY(t1.date_time) as day,SUM(t1.stat) as day_stat
FROM schema1.table_stats t1
INNER JOIN table_companies t2
ON t1.group=t2.group AND t1.subgroup=t2.subgroup
WHERE
MONTH(t1.date_time)=12 AND
YEAR(t1.date_time)=2015
group by t2.company,t2.region,DAY(t1.date_time)
ORDER BY t2.company,t2.region,day_stat DESC
) t4
ON
t4.day_stat >= t3.day_stat AND
t4.company = t3.company AND
t4.region = t3.region
GROUP BY t3.company,t3.region,t3.day_stat
ORDER BY t3.company,t3.region,rank
) t5
WHERE t5.rank<=20
;
Summary of query: from the 2 deepest subqueries it starts by joining both tables, grouping and aggregating the stat
by the company, region and day. This is also where it restricts the month and year. Then it joins this result to a duplicate of itself to be able to generate the rank. Last select limits results to top 20 for each subgroup.
Expected result:
Apologies for presenting as a SQL declaration
INSERT INTO results
(`company`, `region`, `day`, `day_stat`, `rank`)
VALUES
('company3', 'region6', 7, 159, 1),
('company3', 'region6', 22, 147, 2),
('company3', 'region6', 23, 138, 3),
('company3', 'region6', 17, 130, 4),
('company3', 'region6', 14, 125, 5),
('company3', 'region6', 25, 116, 6),
('company3', 'region6', 29, 109, 7),
('company3', 'region6', 16, 108, 8),
('company3', 'region6', 9, 104, 9),
('company3', 'region6', 12, 102, 10),
('company3', 'region6', 19, 100, 11),
('company3', 'region6', 24, 91, 12),
('company3', 'region6', 10, 89, 13),
('company3', 'region6', 4, 86, 14),
('company3', 'region6', 18, 80, 15),
('company3', 'region6', 5, 77, 16),
('company3', 'region6', 21, 75, 17),
('company3', 'region6', 26, 74, 18),
('company3', 'region6', 20, 61, 19),
('company3', 'region6', 8, 59, 20)
;
tl;dr: Apologies for the long post. Asking to optimize http://sqlfiddle.com/#!9/29a7b/1.
The modifications I've made:
table_companies
table on group,subgroup
table_stats
table on group, subgroup
Modified Query:
SELECT
C.company,
C.region,
DAY(S.date_time) day,
SUM(S.stat) day_stat
FROM table_companies C
INNER JOIN table_stats S
ON C.`group` = S.`group` AND C.subgroup = S.subgroup
WHERE MONTH(S.date_time) = 12 AND YEAR(S.date_time) = 2015
GROUP BY C.company, C.region, DAY(S.date_time)
ORDER BY day_stat DESC
LIMIT 20;
WORKING DEMO
There's no rank column in the result set. Since the results are sorted according to rank in descending order so that you can implicitly treat the position of a row in the result set as the rank. Nevertheless if you really need the rank column then here is a working demo of it
Composite index( table_companies)
:
ALTER TABLE `table_companies` ADD INDEX `idx_table_compnaies_group_subgroup` (
`group`,
`subgroup`
);
Composite index( table_stats)
:
ALTER TABLE `table_stats` ADD INDEX `idx_table_stats_group_subgroup` (
`group`,
`subgroup`
);
Explain Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE S ALL idx_table_compnaies_group_subgroup 60 Using where; Using temporary; Using filesort
1 SIMPLE C ref idx_table_companies_group_subgroup idx_table_companies_group_subgroup 57 schema1.S.group,schema1.S.subgroup 1 Using index condition
Good news is MySQL can use these indexes
(because these are under possible keys ). Although it's showing ALL
as type for table_companies
. All I can say it's a small set of data. You cannot judge performance based on small set of data.
More:
I guess you have primary keys in those tables. If you don't have any then create.
EDIT:
SELECT
C.company,
C.region,
tt.day,
tt.total AS day_stat,
tt.rank
FROM table_companies C
INNER JOIN
(
SELECT
t.*,
IF(t.businessUnit = @sameBusinessUnit, @rn := @rn + 1, @rn := 1) AS rank,
@sameBusinessUnit := t.businessUnit
FROM
(
SELECT
S1.`group`,
S1.subgroup,
CONCAT(S1.`group`,S1.subgroup) AS businessUnit,
DAY(S1.date_time) AS day,
SUM(S1.stat) total
FROM table_stats S1
GROUP BY S1.group,S1.subgroup,DAY(S1.date_time)
ORDER BY total DESC
)AS t
CROSS JOIN (SELECT @rn := 1, @sameBusinessUnit := '') var
) AS tt
ON C.`group`=tt.`group` AND C.subgroup = tt.subgroup
WHERE tt.rank <= 20
ORDER BY tt.`group`,tt.`subgroup`,tt.rank;
WORKING DEMO(Version 2.0)
只需包含一个用于组的索引,以便连接变得更加高效
CREATE TABLE table_companies
(`pk_id` int, `company` varchar(8),
`region` varchar(8), `group` varchar(7), `subgroup` varchar(10),
PRIMARY KEY (`pk_id`),
UNIQUE KEY `pk_id_id_UNIQUE` (`pk_id`),
INDEX idx_group (`group`, `subgroup`)
)
;
链接地址: http://www.djcxy.com/p/62460.html
上一篇: ColdFusion中的查询优化
下一篇: 排名前20位的查询