Monday, August 13, 2007

MSSQL : Deleting duplicate rows

Select the duplicate key values into a holding table. For example:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1

Select the duplicate rows into a holding table, eliminating
duplicates in the process. For example:
SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2

At this point, the holddups table should have unique PKs,
however, this will not be the case if t1 had duplicate PKs,
yet unique rows
SELECT col1, col2, count(*)
FROM holddups
GROUP BY col1, col2

If count(*) returns more than 1 for certain rows,
determine which of the rows to delete which have the
duplicate keys but unique rows and then only process further.

Delete the duplicate rows from the original table.
For example:
DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2


Put the unique rows back in the original table.
For example:
INSERT t1 SELECT * FROM holddups

No comments: