Find and delete duplicated records in a table using t-SQL
By ganton ~ February 8th, 2010. Filed under: SQL.
If you do search in the internet how to solve the problem with duplicated records you will find different solutions. In this post I’ll describe the solution that I used for solving duplicated records problem. Let’s assume that there is a table named myTable with five columns (id, fk_id1, fk_id2, fk_id3, timestamp_col). In all of the three foreign key columns there are duplicated records – in this sample 2 duplications per record. id and timestamp_col are with unique values. To find the duplications you can used a query like this below
1: select fk_id1, fk_id2, fk_id3, count(*) from assignedRole
2: group by fk_id1, fk_id2, fk_id3
3: having count(*) > 1;
This query will return all the duplicated records and the number of duplications by each record. In my case all records contained 2 duplications. It is obvious that one of the duplication should be removed and the second one should stay. Doing so we will remove the duplications and the system will continue to work without any errors or data losses. To achieve this we can select all duplicated records getting the max or min id of the record. Then we can store this information in a temporary table. Then all values for the original table can be removed using the ids from the temporary table. The example is below.
1: select fk_id1, fk_id2, fk_id3, min(id) id
2: into #duplicates
3: from myTable
4: group by fk_id1, fk_id2, fk_id3
5: having count(*) > 1;
6:
7: delete from myTable
8: where id in (select id from #duplicates);
9:
10: drop table #duplicates;