在SQL Server中查找重复的行
我有一个组织的SQL Server数据库,并且有许多重复的行。 我想运行一个select语句来获取所有这些以及大量数据,但也返回与每个组织关联的id。
像这样的陈述:
SELECT orgName, COUNT(*) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
会返回类似的东西
orgName | dupes
ABC Corp | 7
Foo Federation | 5
Widget Company | 2
但我也想获取它们的ID。 有没有办法做到这一点? 也许就像一个
orgName | dupeCount | id
ABC Corp | 1 | 34
ABC Corp | 2 | 5
...
Widget Company | 1 | 10
Widget Company | 2 | 2
原因是这里还有一个单独的用户链接表,这些用户链接到这些组织,我想统一它们(因此删除欺骗用户链接到同一个组织,而不是dupe orgs)。 但我想手动部分,所以我不会搞砸任何东西,但我仍然需要一个声明来返回所有重复组织的ID,以便我可以查看用户列表。
select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
SELECT orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
您可以运行以下查询并使用max(id)
查找重复项并删除这些行。
SELECT orgName, COUNT(*), Max(ID) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
但是你必须运行这个查询几次。
你可以这样做:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
如果您只想返回可以删除的记录(每个记录都有一个),则可以使用:
SELECT
id, orgName
FROM (
SELECT
orgName, id,
ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
FROM organizations
) AS d
WHERE intRow != 1
编辑:SQL Server 2000没有ROW_NUMBER()函数。 相反,您可以使用:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id
链接地址: http://www.djcxy.com/p/30507.html