Get Max Count from Group by

I have trouble to get a output for group function in sql.Below are the details for table

I have 1 table table name "checks" have 2 columns pid,cid

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 PID                                                VARCHAR2(20)
 CID                                                VARCHAR2(20)

Below are rows available

select * from checks;

PID                  CID
-------------------- --------------------
p1                   c1
p1                   c1
p1                   c2
p1                   c2
p1                   c2
p2                   c1
p2                   c1
p2                   c1
p2                   c1
p2                   c1
p2                   c1
p2                   c2
p2                   c2
p2                   c2
p2                   c2
p2                   c2

P represents participants and c represents category

question

I need to know which participant participate more than one category in that which category participant participate maximum.(for every participants)

Expected result:

pid   cid    count(cid)
---   ---    -----------
p1    c2         3
p2    c1         6

Assuming a database system (you haven't specified one, but I suspect Oracle?) that supports windowing functions and CTEs, I'd write:

;With Groups as (
    select pid,cid,COUNT(*) as cnt from checks group by pid,cid
), Ordered as (
    select pid,cid,cnt,
       ROW_NUMBER() OVER (PARTITION BY pid ORDER BY cnt desc) as rn,
       COUNT(*) OVER (PARTITION BY pid) as multi
    from Groups
)
select pid,cid,cnt
from Ordered
where rn = 1 and multi > 1

The first CTE ( Groups ) just finds the counts for each unique cid,pid combination. The second CTE ( Ordered ) assigns row numbers to these results based on the count - with the highest count assigned a row number of 1. We also count how many total rows have been produced for each pid .

Finally, we select those rows which were assigned a row number of 1 (the highest count) and for which we obtained multiple results for the same pid .

Here's an Oracle fiddle to play with. And here's an SQL Server version (and thanks to Andriy M for producing the Oracle one)


This will give you some basic ideas:

And the results shown below. Also since p1 participated in more than one Category so p1 will come with each new category in a different row when we use: 'group by PID,CID'


Step by step:

First, get the counts of rows per (PID, CID) . This is simple:

SELECT
  PID,
  CID,
  COUNT(*) AS cnt
FROM checks
GROUP BY
  PID,
  CID

And you get this result set for your example:

PID  CID  cnt
---  ---  ---
p1   c1   2
p1   c2   3
p2   c1   6
p2   c2   5

Now, throw in COUNT(*) OVER (PARTITION BY PID) to return the number of categories per person:

SELECT
  PID,
  CID,
  COUNT(*) AS cnt,
  COUNT(*) OVER (PARTITION BY PID) AS cat_cnt
FROM checks
GROUP BY
  PID,
  CID

The OVER clause turns a "normal" aggregate function COUNT() into a window aggregate function. That makes the COUNT(*) operate on the grouped row set rather than the source one. So, COUNT(*) OVER ... in this case counts rows per PID , which for us has the meaning of category counts per person. And this is the updated result set:

PID  CID  cnt  cnt_cat
---  ---  ---  -------
p1   c1   2    2
p1   c2   3    2
p2   c1   6    2
p2   c2   5    2

One more thing left is to rank the cnt values per PID . This may be tricky as there may be ties at the top counts. If you always want a single row per PID and are perfectly indifferent to which CID, cnt will be in case of a tie, you can modify the query like this:

SELECT
  PID,
  CID,
  COUNT(*) AS cnt,
  COUNT(*) OVER (PARTITION BY PID) AS cat_cnt,
  ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn
FROM checks
GROUP BY
  PID,
  CID

And this is what the result set will look like:

PID  CID  cnt  cnt_cat  rn
---  ---  ---  -------  --
p1   c1   2    2        2
p1   c2   3    2        1
p2   c1   6    2        1
p2   c2   5    2        2

At this point, the results contain all the data necessary to produce the final output, you just need to filter on cnt_cat and rn . However, you cannot do that directly. Instead, use the last query as a derived table, be it a WITH table expression or a "normal" subselect. Below is an example using WITH :

WITH grouped AS (
  SELECT
    PID,
    CID,
    COUNT(*) AS cnt,
    COUNT(*) OVER (PARTITION BY PID) AS cat_cnt,
    ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn
  FROM checks
  GROUP BY
    PID,
    CID
)
SELECT PID, CID, cnt
FROM grouped
WHERE cat_cnt > 1
  AND rn = 1
;

Here's a SQL Fiddle demo (using Oracle): http://sqlfiddle.com/#!4/cd62d/8

To expand a bit more on the ranking part, if you still want to return a single CID, cnt per PID but would prefer to have more control on what row should be decided as the "winner", you'll need to add a tie-breaker to the ORDER BY clause of the ranking function. As an example, you could modify the original expression,

ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn

with this one:

ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC, CID) AS rn

Ie the tie-breaker is CID , and so of the two or more CID s with the top count, the one that sorts before the others wins.

Still, you may want to decide to return all the top counts per PID . In that case, use either RANK() or DENSE_RANK() instead of ROW_NUMBER() (and no tie-breaker), ie like this:

RANK() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn
链接地址: http://www.djcxy.com/p/72878.html

上一篇: 与Heroku的Django部署

下一篇: 从组中获取最大计数