SQL query to remove duplicates from large tables using join
I am new in using T-SQL queries and I was trying different solutions in order to remove duplicate rows from a fairy large table (with over 270,000 rows). The table looks something like:
TableA
-----------
RowID int not null identity(1,1) primary key,
Col1 varchar(50) not null,
Col2 int not null,
Col3 varchar(50) not null
The rows for this table are not perfect duplicates because of the existence of the RowID identity field. The second table that I need to join with:
TableB
-----------
RowID int not null identity(1,1) primary key,
Col1 int not null,
Col2 varchar(50) not null
In TableA I have something like:
1 | gray | 4 | Angela
2 | red | 6 | Diana
3 | black| 6 | Alina
4 | black| 11 | Dana
5 | gray | 4 | Angela
6 | red | 12 | Dana
7 | red | 6 | Diana
8 | black| 11 | Dana
And in TableB:
1 | 6 | klm
2 | 11 | lmi
Second column from TableB (Col1) is foreign key inside TableA (Col2). I need to remove ONLY the duplicates from TableA that has Col2 = 6 ignoring the other duplicates.
1 | gray | 4 | Angela
2 | red | 6 | Diana
4 | black| 6 | Alina
5 | black| 11 | Dana
6 | gray | 4 | Angela
7 | red | 12 | Dana
8 | black| 11 | Dana
I tried using
DELETE FROM TableA a inner join TableB b on a.Col2=b.Col1
WHERE a.RowId NOT IN (SELECT MIN(RowId) FROM TableA GROUP BY RowId, Col1, Col2, Col3) and b.Col2="klm"
but I still get some of the duplicates that I need to remove.
What is the best way to remove not perfect duplicate rows using join?
well min would only be one and group by PK will give you everything
and the RowID are wrong in the example
DELETE FROM TableA a
inner join TableB b
on a.Col2=b.Col1
WHERE a.RowId NOT IN (SELECT MIN(RowId)
FROM TableA GROUP BY RowId, Col1, Col2, Col3)
and b.Col2="klm"
this would be rows to delete
select *
from
( select *
, row_number over (partition by Col1, Col3 order by RowID) as rn
from TableA a
where del.Col2 = 6
) tt
where tt.rn > 1
another solution is:
WITH CTE AS(
SELECT t.[col1], t.[col2], t.[col3], t.[col4],
RN = ROW_NUMBER() OVER (PARTITION BY t.[col1], t.[col2], t.[col3], t.[col4] ORDER BY t.[col1])
FROM [TableA] t
)
delete from CTE WHERE RN > 1
regards.
链接地址: http://www.djcxy.com/p/24736.html上一篇: 如何访问通过调用TSQL中的存储过程生成的当前作用域中的数据集?
下一篇: SQL查询使用连接从大型表中删除重复项