Get Max Count from Group by
I have trouble to get a output for group function in sql.Below are the details for table
I have 1 table table name "checks" have 2 columns pid,cid
Name Null? Type
----------------------------------------- -------- ----------------------------
PID VARCHAR2(20)
CID VARCHAR2(20)
Below are rows available
select * from checks;
PID CID
-------------------- --------------------
p1 c1
p1 c1
p1 c2
p1 c2
p1 c2
p2 c1
p2 c1
p2 c1
p2 c1
p2 c1
p2 c1
p2 c2
p2 c2
p2 c2
p2 c2
p2 c2
P represents participants and c represents category
question
I need to know which participant participate more than one category in that which category participant participate maximum.(for every participants)
Expected result:
pid cid count(cid)
--- --- -----------
p1 c2 3
p2 c1 6
Assuming a database system (you haven't specified one, but I suspect Oracle?) that supports windowing functions and CTEs, I'd write:
;With Groups as (
select pid,cid,COUNT(*) as cnt from checks group by pid,cid
), Ordered as (
select pid,cid,cnt,
ROW_NUMBER() OVER (PARTITION BY pid ORDER BY cnt desc) as rn,
COUNT(*) OVER (PARTITION BY pid) as multi
from Groups
)
select pid,cid,cnt
from Ordered
where rn = 1 and multi > 1
The first CTE ( Groups
) just finds the counts for each unique cid,pid
combination. The second CTE ( Ordered
) assigns row numbers to these results based on the count - with the highest count assigned a row number of 1. We also count how many total rows have been produced for each pid
.
Finally, we select those rows which were assigned a row number of 1 (the highest count) and for which we obtained multiple results for the same pid
.
Here's an Oracle fiddle to play with. And here's an SQL Server version (and thanks to Andriy M for producing the Oracle one)
This will give you some basic ideas:
And the results shown below. Also since p1 participated in more than one Category so p1 will come with each new category in a different row when we use: 'group by PID,CID'
Step by step:
First, get the counts of rows per (PID, CID)
. This is simple:
SELECT
PID,
CID,
COUNT(*) AS cnt
FROM checks
GROUP BY
PID,
CID
And you get this result set for your example:
PID CID cnt
--- --- ---
p1 c1 2
p1 c2 3
p2 c1 6
p2 c2 5
Now, throw in COUNT(*) OVER (PARTITION BY PID)
to return the number of categories per person:
SELECT
PID,
CID,
COUNT(*) AS cnt,
COUNT(*) OVER (PARTITION BY PID) AS cat_cnt
FROM checks
GROUP BY
PID,
CID
The OVER
clause turns a "normal" aggregate function COUNT()
into a window aggregate function. That makes the COUNT(*)
operate on the grouped row set rather than the source one. So, COUNT(*) OVER ...
in this case counts rows per PID
, which for us has the meaning of category counts per person. And this is the updated result set:
PID CID cnt cnt_cat
--- --- --- -------
p1 c1 2 2
p1 c2 3 2
p2 c1 6 2
p2 c2 5 2
One more thing left is to rank the cnt
values per PID
. This may be tricky as there may be ties at the top counts. If you always want a single row per PID
and are perfectly indifferent to which CID, cnt
will be in case of a tie, you can modify the query like this:
SELECT
PID,
CID,
COUNT(*) AS cnt,
COUNT(*) OVER (PARTITION BY PID) AS cat_cnt,
ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn
FROM checks
GROUP BY
PID,
CID
And this is what the result set will look like:
PID CID cnt cnt_cat rn
--- --- --- ------- --
p1 c1 2 2 2
p1 c2 3 2 1
p2 c1 6 2 1
p2 c2 5 2 2
At this point, the results contain all the data necessary to produce the final output, you just need to filter on cnt_cat
and rn
. However, you cannot do that directly. Instead, use the last query as a derived table, be it a WITH
table expression or a "normal" subselect. Below is an example using WITH
:
WITH grouped AS (
SELECT
PID,
CID,
COUNT(*) AS cnt,
COUNT(*) OVER (PARTITION BY PID) AS cat_cnt,
ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn
FROM checks
GROUP BY
PID,
CID
)
SELECT PID, CID, cnt
FROM grouped
WHERE cat_cnt > 1
AND rn = 1
;
Here's a SQL Fiddle demo (using Oracle): http://sqlfiddle.com/#!4/cd62d/8
To expand a bit more on the ranking part, if you still want to return a single CID, cnt
per PID
but would prefer to have more control on what row should be decided as the "winner", you'll need to add a tie-breaker to the ORDER BY
clause of the ranking function. As an example, you could modify the original expression,
ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn
with this one:
ROW_NUMBER() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC, CID) AS rn
Ie the tie-breaker is CID
, and so of the two or more CID
s with the top count, the one that sorts before the others wins.
Still, you may want to decide to return all the top counts per PID
. In that case, use either RANK()
or DENSE_RANK()
instead of ROW_NUMBER()
(and no tie-breaker), ie like this:
RANK() OVER (PARTITION BY PID ORDER BY COUNT(*) DESC) AS rn
链接地址: http://www.djcxy.com/p/72878.html
上一篇: 与Heroku的Django部署
下一篇: 从组中获取最大计数