Get top 1 row of each group

I have a table which I want to get the latest entry for each group. Here's the table:

DocumentStatusLogs Table

|ID| DocumentID | Status | DateCreated |
| 2| 1          | S1     | 7/29/2011   |
| 3| 1          | S2     | 7/30/2011   |
| 6| 1          | S1     | 8/02/2011   |
| 1| 2          | S1     | 7/28/2011   |
| 4| 2          | S2     | 7/30/2011   |
| 5| 2          | S3     | 8/01/2011   |
| 6| 3          | S1     | 8/02/2011   |

The table will be grouped by DocumentID and sorted by DateCreated in descending order. For each DocumentID , I want to get the latest status.

My preferred output:

| DocumentID | Status | DateCreated |
| 1          | S1     | 8/02/2011   |
| 2          | S3     | 8/01/2011   |
| 3          | S1     | 8/02/2011   |
  • Is there any aggregate function to get only the top from each group? See pseudo-code GetOnlyTheTop below:

    SELECT
      DocumentID,
      GetOnlyTheTop(Status),
      GetOnlyTheTop(DateCreated)
    FROM DocumentStatusLogs
    GROUP BY DocumentID
    ORDER BY DateCreated DESC
    
  • If such function doesn't exist, is there any way I can achieve the output I want?

  • Or at the first place, could this be caused by unnormalized database? I'm thinking, since what I'm looking for is just one row, should that status also be located in the parent table?
  • Please see the parent table for more information:

    Current Documents Table

    | DocumentID | Title  | Content  | DateCreated |
    | 1          | TitleA | ...      | ...         |
    | 2          | TitleB | ...      | ...         |
    | 3          | TitleC | ...      | ...         |
    

    Should the parent table be like this so that I can easily access its status?

    | DocumentID | Title  | Content  | DateCreated | CurrentStatus |
    | 1          | TitleA | ...      | ...         | s1            |
    | 2          | TitleB | ...      | ...         | s3            |
    | 3          | TitleC | ...      | ...         | s1            |
    

    UPDATE I just learned how to use "apply" which makes it easier to address such problems.


    ;WITH cte AS
    (
       SELECT *,
             ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
       FROM DocumentStatusLogs
    )
    SELECT *
    FROM cte
    WHERE rn = 1
    

    If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

    As for normalised or not, it depends if you want to:

  • maintain status in 2 places
  • preserve status history
  • ...
  • As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.


    I just learned how to use cross apply . Here's how to use it in this scenario:

     select d.DocumentID, ds.Status, ds.DateCreated 
     from Documents as d 
     cross apply 
         (select top 1 Status, DateCreated
          from DocumentStatusLogs 
          where DocumentID = d.DocumentId
          order by DateCreated desc) as ds
    

    I've done some timings over the various recommendations here, and the results really depend on the size of the table involved, but the most consistent solution is using the CROSS APPLY These tests were run against SQL Server 2008-R2, using a table with 6,500 records, and another (identical schema) with 137 million records. The columns being queried are part of the primary key on the table, and the table width is very small (about 30 bytes). The times are reported by SQL Server from the actual execution plan.

    Query                                  Time for 6500 (ms)    Time for 137M(ms)
    
    CROSS APPLY                                    17.9                17.9
    SELECT WHERE col = (SELECT MAX(COL)…)           6.6               854.4
    DENSE_RANK() OVER PARTITION                     6.6               907.1
    

    I think the really amazing thing was how consistent the time was for the CROSS APPLY regardless of the number of rows involved.

    链接地址: http://www.djcxy.com/p/94522.html

    上一篇: 使DIV像使用CSS的表格一样工作

    下一篇: 获得每个组的前1行