Top 20 Group Ranking Query

I am creating a reporting structure where I need to output the top 20 days of aggregate stats for each unique Company - Region. I have completed this task but feel that my code is overly complicated and I am requesting help optimizing it.

I have 2 tables involved in this process. The first lists all the possible Company - Region - Group - Subgroups. The second has hourly stats by the Group - Subgroup.

SQL Fiddle link: http://sqlfiddle.com/#!9/29a7b/1
NOTE : currently getting a SELECT command denied to user '<user>'@'<ip>' for table 'table_stats' error on my SQL Fiddle, would appreciate help resolving this as well.

table_companies declaration and dummy data:

CREATE TABLE `table_companies` (
  `pk_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `company` varchar(45) NOT NULL,
  `region` varchar(45) NOT NULL,
  `group` varchar(45) NOT NULL,
  `subgroup` varchar(45) NOT NULL,
  PRIMARY KEY (`pk_id`),
  UNIQUE KEY `pk_id_id_UNIQUE` (`pk_id`)
);

INSERT INTO table_companies
    (`pk_id`, `company`, `region`, `group`, `subgroup`)
VALUES
    (1, 'company1', 'region1', 'group1', 'subgroup1'),
    (2, 'company1', 'region1', 'group1', 'subgroup2'),
    (3, 'company1', 'region2', 'group2', 'subgroup3'),
    (4, 'company1', 'region3', 'group3', 'subgroup4'),
    (5, 'company2', 'region1', 'group4', 'subgroup5'),
    (6, 'company2', 'region3', 'group5', 'subgroup6'),
    (7, 'company2', 'region3', 'group6', 'subgroup7'),
    (8, 'company2', 'region4', 'group7', 'subgroup8'),
    (9, 'company2', 'region5', 'group8', 'subgroup9'),
    (10, 'company3', 'region6', 'group9', 'subgroup10'),
    (11, 'company3', 'region7', 'group10', 'subgroup11'),
    (12, 'company3', 'region8', 'group11', 'subgroup12'),
    (13, 'company4', 'region9', 'group12', 'subgroup13'),
    (14, 'company4', 'region10', 'group13', 'subgroup14'),
    (15, 'company5', 'region11', 'group14', 'subgroup15'),
    (16, 'company5', 'region12', 'group15', 'subgroup16')
;

table_stats declaration:
Simplified to only contain a couple of the hours per day for only 1 group - subgroup.

CREATE TABLE `table_stats` (
  `pk_id` int(10) unsigned NOT NULL,
  `date_time` datetime NOT NULL,
  `group` varchar(45) NOT NULL,
  `subgroup` varchar(45) NOT NULL,
  `stat` int(10) unsigned NOT NULL,
  PRIMARY KEY (`pk_id`),
  UNIQUE KEY `pk_id_UNIQUE` (`pk_id`),
  UNIQUE KEY `om_unique` (`date_time`,`group`,`subgroup`)
);

INSERT INTO table_stats
    (`pk_id`, `date_time`, `group`, `subgroup`, `stat`)
VALUES
    (1, '2015-12-01 06:00:00', 'group9', 'subgroup10', 14),
    (2, '2015-12-01 12:00:00', 'group9', 'subgroup10', 14),
    (3, '2015-12-02 06:00:00', 'group9', 'subgroup10', 2),
    (4, '2015-12-02 12:00:00', 'group9', 'subgroup10', 51),
    (5, '2015-12-03 06:00:00', 'group9', 'subgroup10', 30),
    (6, '2015-12-03 12:00:00', 'group9', 'subgroup10', 6),
    (7, '2015-12-04 06:00:00', 'group9', 'subgroup10', 9),
    (8, '2015-12-04 12:00:00', 'group9', 'subgroup10', 77),
    (9, '2015-12-05 06:00:00', 'group9', 'subgroup10', 70),
    (10, '2015-12-05 12:00:00', 'group9', 'subgroup10', 7),
    (11, '2015-12-06 06:00:00', 'group9', 'subgroup10', 38),
    (12, '2015-12-06 12:00:00', 'group9', 'subgroup10', 5),
    (13, '2015-12-07 06:00:00', 'group9', 'subgroup10', 86),
    (14, '2015-12-07 12:00:00', 'group9', 'subgroup10', 73),
    (15, '2015-12-08 06:00:00', 'group9', 'subgroup10', 45),
    (16, '2015-12-08 12:00:00', 'group9', 'subgroup10', 14),
    (17, '2015-12-09 06:00:00', 'group9', 'subgroup10', 66),
    (18, '2015-12-09 12:00:00', 'group9', 'subgroup10', 38),
    (19, '2015-12-10 06:00:00', 'group9', 'subgroup10', 12),
    (20, '2015-12-10 12:00:00', 'group9', 'subgroup10', 77),
    (21, '2015-12-11 06:00:00', 'group9', 'subgroup10', 21),
    (22, '2015-12-11 12:00:00', 'group9', 'subgroup10', 18),
    (23, '2015-12-12 06:00:00', 'group9', 'subgroup10', 28),
    (24, '2015-12-12 12:00:00', 'group9', 'subgroup10', 74),
    (25, '2015-12-13 06:00:00', 'group9', 'subgroup10', 20),
    (26, '2015-12-13 12:00:00', 'group9', 'subgroup10', 37),
    (27, '2015-12-14 06:00:00', 'group9', 'subgroup10', 66),
    (28, '2015-12-14 12:00:00', 'group9', 'subgroup10', 59),
    (29, '2015-12-15 06:00:00', 'group9', 'subgroup10', 26),
    (30, '2015-12-15 12:00:00', 'group9', 'subgroup10', 0),
    (31, '2015-12-16 06:00:00', 'group9', 'subgroup10', 77),
    (32, '2015-12-16 12:00:00', 'group9', 'subgroup10', 31),
    (33, '2015-12-17 06:00:00', 'group9', 'subgroup10', 59),
    (34, '2015-12-17 12:00:00', 'group9', 'subgroup10', 71),
    (35, '2015-12-18 06:00:00', 'group9', 'subgroup10', 7),
    (36, '2015-12-18 12:00:00', 'group9', 'subgroup10', 73),
    (37, '2015-12-19 06:00:00', 'group9', 'subgroup10', 72),
    (38, '2015-12-19 12:00:00', 'group9', 'subgroup10', 28),
    (39, '2015-12-20 06:00:00', 'group9', 'subgroup10', 50),
    (40, '2015-12-20 12:00:00', 'group9', 'subgroup10', 11),
    (41, '2015-12-21 06:00:00', 'group9', 'subgroup10', 71),
    (42, '2015-12-21 12:00:00', 'group9', 'subgroup10', 4),
    (43, '2015-12-22 06:00:00', 'group9', 'subgroup10', 78),
    (44, '2015-12-22 12:00:00', 'group9', 'subgroup10', 69),
    (45, '2015-12-23 06:00:00', 'group9', 'subgroup10', 83),
    (46, '2015-12-23 12:00:00', 'group9', 'subgroup10', 55),
    (47, '2015-12-24 06:00:00', 'group9', 'subgroup10', 71),
    (48, '2015-12-24 12:00:00', 'group9', 'subgroup10', 20),
    (49, '2015-12-25 06:00:00', 'group9', 'subgroup10', 90),
    (50, '2015-12-25 12:00:00', 'group9', 'subgroup10', 26),
    (51, '2015-12-26 06:00:00', 'group9', 'subgroup10', 1),
    (52, '2015-12-26 12:00:00', 'group9', 'subgroup10', 73),
    (53, '2015-12-27 06:00:00', 'group9', 'subgroup10', 4),
    (54, '2015-12-27 12:00:00', 'group9', 'subgroup10', 18),
    (55, '2015-12-28 06:00:00', 'group9', 'subgroup10', 4),
    (56, '2015-12-28 12:00:00', 'group9', 'subgroup10', 30),
    (57, '2015-12-29 06:00:00', 'group9', 'subgroup10', 56),
    (58, '2015-12-29 12:00:00', 'group9', 'subgroup10', 53),
    (59, '2015-12-30 06:00:00', 'group9', 'subgroup10', 33),
    (60, '2015-12-31 12:00:00', 'group9', 'subgroup10', 8)
;

Query to optimize:

SELECT * FROM
    (
    SELECT t3.company,t3.region,t3.day, t3.day_stat,COUNT(*) as rank
    FROM
        (
            SELECT t2.company,t2.region,DAY(t1.date_time) as day,SUM(t1.stat) as day_stat
            FROM schema1.table_stats t1
            INNER JOIN table_companies t2
            ON t1.group=t2.group AND t1.subgroup=t2.subgroup
            WHERE
                MONTH(t1.date_time)=12 AND
                YEAR(t1.date_time)=2015
            group by t2.company,t2.region,DAY(t1.date_time)
            ORDER BY t2.company,t2.region,day_stat DESC
        ) t3
    JOIN
    (
            SELECT t2.company,t2.region,DAY(t1.date_time) as day,SUM(t1.stat) as day_stat
            FROM schema1.table_stats t1
            INNER JOIN table_companies t2
            ON t1.group=t2.group AND t1.subgroup=t2.subgroup
            WHERE
                MONTH(t1.date_time)=12 AND
                YEAR(t1.date_time)=2015
            group by t2.company,t2.region,DAY(t1.date_time)
            ORDER BY t2.company,t2.region,day_stat DESC
        ) t4
    ON
        t4.day_stat >= t3.day_stat AND
        t4.company = t3.company AND
        t4.region = t3.region
    GROUP BY t3.company,t3.region,t3.day_stat
    ORDER BY t3.company,t3.region,rank
    ) t5
WHERE t5.rank<=20
;

Summary of query: from the 2 deepest subqueries it starts by joining both tables, grouping and aggregating the stat by the company, region and day. This is also where it restricts the month and year. Then it joins this result to a duplicate of itself to be able to generate the rank. Last select limits results to top 20 for each subgroup.

Expected result:
Apologies for presenting as a SQL declaration

INSERT INTO results
    (`company`, `region`, `day`, `day_stat`, `rank`)
VALUES
    ('company3', 'region6', 7, 159, 1),
    ('company3', 'region6', 22, 147, 2),
    ('company3', 'region6', 23, 138, 3),
    ('company3', 'region6', 17, 130, 4),
    ('company3', 'region6', 14, 125, 5),
    ('company3', 'region6', 25, 116, 6),
    ('company3', 'region6', 29, 109, 7),
    ('company3', 'region6', 16, 108, 8),
    ('company3', 'region6', 9, 104, 9),
    ('company3', 'region6', 12, 102, 10),
    ('company3', 'region6', 19, 100, 11),
    ('company3', 'region6', 24, 91, 12),
    ('company3', 'region6', 10, 89, 13),
    ('company3', 'region6', 4, 86, 14),
    ('company3', 'region6', 18, 80, 15),
    ('company3', 'region6', 5, 77, 16),
    ('company3', 'region6', 21, 75, 17),
    ('company3', 'region6', 26, 74, 18),
    ('company3', 'region6', 20, 61, 19),
    ('company3', 'region6', 8, 59, 20)
;

tl;dr: Apologies for the long post. Asking to optimize http://sqlfiddle.com/#!9/29a7b/1.


The modifications I've made:

  • Completely modified your query
  • Added a composite index in table_companies table on group,subgroup
  • Added a composite index in table_stats table on group, subgroup

  • Modified Query:

    SELECT 
        C.company,
        C.region,
        DAY(S.date_time) day,
        SUM(S.stat) day_stat
    FROM table_companies C
    INNER JOIN table_stats S
    ON C.`group` = S.`group` AND C.subgroup = S.subgroup
    WHERE MONTH(S.date_time) = 12 AND YEAR(S.date_time) = 2015
    GROUP BY C.company, C.region, DAY(S.date_time)
    ORDER BY day_stat DESC
    LIMIT 20;
    

    WORKING DEMO

    There's no rank column in the result set. Since the results are sorted according to rank in descending order so that you can implicitly treat the position of a row in the result set as the rank. Nevertheless if you really need the rank column then here is a working demo of it

    Composite index( table_companies) :

    ALTER TABLE `table_companies` ADD INDEX `idx_table_compnaies_group_subgroup` (
        `group`,
        `subgroup`
    );
    

    Composite index( table_stats) :

    ALTER TABLE `table_stats` ADD INDEX `idx_table_stats_group_subgroup` (
    `group`,
    `subgroup`
    );
    

    Explain Result:

    id  select_type table   type    possible_keys   key key_len ref rows    Extra
    1   SIMPLE  S   ALL idx_table_compnaies_group_subgroup              60  Using where; Using temporary; Using filesort
    1   SIMPLE  C   ref idx_table_companies_group_subgroup  idx_table_companies_group_subgroup  57  schema1.S.group,schema1.S.subgroup  1   Using index condition
    

    Good news is MySQL can use these indexes (because these are under possible keys ). Although it's showing ALL as type for table_companies . All I can say it's a small set of data. You cannot judge performance based on small set of data.

    More:

    I guess you have primary keys in those tables. If you don't have any then create.

    EDIT:

    SELECT 
        C.company,
        C.region,
        tt.day,
        tt.total AS day_stat,
        tt.rank
    FROM table_companies C 
    INNER JOIN 
    (
    SELECT 
    t.*,
    IF(t.businessUnit = @sameBusinessUnit, @rn := @rn + 1, @rn := 1) AS rank,
    @sameBusinessUnit := t.businessUnit
    FROM 
    (
        SELECT 
           S1.`group`,
           S1.subgroup,
           CONCAT(S1.`group`,S1.subgroup) AS businessUnit,
           DAY(S1.date_time) AS day,
           SUM(S1.stat) total
        FROM table_stats S1
        GROUP BY S1.group,S1.subgroup,DAY(S1.date_time)
        ORDER BY total DESC
    )AS t
    CROSS JOIN (SELECT @rn := 1, @sameBusinessUnit := '') var
    ) AS tt
    ON C.`group`=tt.`group` AND C.subgroup = tt.subgroup
    WHERE tt.rank <= 20
    ORDER BY tt.`group`,tt.`subgroup`,tt.rank;
    

    WORKING DEMO(Version 2.0)


    只需包含一个用于组的索引,以便连接变得更加高效

    CREATE TABLE table_companies
        (`pk_id` int, `company` varchar(8), 
         `region` varchar(8), `group` varchar(7), `subgroup` varchar(10),
         PRIMARY KEY (`pk_id`),
         UNIQUE KEY `pk_id_id_UNIQUE` (`pk_id`),  
    
         INDEX idx_group (`group`, `subgroup`)
        )
    ;
    
    链接地址: http://www.djcxy.com/p/62460.html

    上一篇: ColdFusion中的查询优化

    下一篇: 排名前20位的查询