Find and delete duplicated records in a table using t-SQL



By ganton ~ February 8th, 2010. Filed under: SQL.

If you do search in the internet how to solve the problem with duplicated records you will find different solutions. In this post I’ll describe the solution that I used for solving duplicated records problem. Let’s assume that there is a table named myTable with five columns (id, fk_id1, fk_id2, fk_id3, timestamp_col). In all of the three foreign key columns there are duplicated records – in this sample 2 duplications per record. id and timestamp_col are with unique values. To find the duplications you can used a query like this below

   1: select fk_id1, fk_id2, fk_id3, count(*) from assignedRole

   2: group by fk_id1, fk_id2, fk_id3

   3: having count(*) > 1;

This query will return all the duplicated records and the number of duplications by each record. In my case all records contained 2 duplications. It is obvious that one of the duplication should be removed and the second one should stay. Doing so we will remove the duplications and the system will continue to work without any errors or data losses. To achieve this we can select all duplicated records getting the max or min id of the record. Then we can store this information in a temporary table. Then all values for the original table can be removed using the ids from the temporary table. The example is below.

   1: select fk_id1, fk_id2, fk_id3, min(id) id 

   2: into #duplicates

   3: from myTable

   4: group by fk_id1, fk_id2, fk_id3

   5: having count(*) > 1;

   6:  

   7: delete from myTable

   8: where id in (select id from #duplicates);

   9:  

  10: drop table #duplicates;

Concatenate nvarchar field and ntext field. Push LINQ to convert ntext to nvarchar.



By ganton ~ December 9th, 2009. Filed under: SQL.

Let’s assume that we have a table like

image

This table contains a field name of type nvarchar(50) and a field description of type ntext. Assume we need name and description field values concatenated in one field. Using LINQ we can write a query like that.

   1: var result = (from r in dataContext.MyTable

   2:              select new

   3:              {

   4:                 Id = r.Id,

   5:                 NameAndDescription = r.Name + r.Description

   6:              }).SingleOrDefault();

And we will get a SqlException that add (+) operator is invalid for nvarchar and ntext fields. That happens because the SQL statement generated by LINQ is similar to that one below.

   1: SELECT [t0].[id], [t0].[name] + [t0].[description] AS [NameAndDescription]

   2: FROM [mytable] AS [t0] 

An option is to cast ntext field to a nvarchar. Clear, but how to push LINQ to do so? It is easy, we should just replace r.Description with Convert.ToString(r.Description). Then generated SQL statement by LINQ is similar to that one below.

   1: SELECT [t0].[id], [t0].[name] + (CONVERT(NVarChar(MAX),[t0].[description]) AS [NameAndDescription]

   2: FROM [mytable] AS [t0]

If the number of characters in description field doesn’t exceed the MAX number it will work OK. Here there is one not really obvious problem. Generated SQL statement uses MAX. MAX is supported since SQL Server 2005. If you use SQL Server 2000 you cannot use this approach. If you have data with number of characters more then MAX or you use SQL Server 2000 you can create a mapping class that will have one more property that will be generated on the site of the application like this class below.

   1: public class MyTableMap

   2: {

   3:     private string nameAndDescription = null;

   4:  

   5:     public int Id { get; set; }

   6:     public string Name { get; set; }

   7:     public string Description { get; set; }

   8:     public string NameAndDescription 

   9:     {

  10:         get

  11:         {

  12:             if (nameAndDescription == null)

  13:             {

  14:                 nameAndDescription = Name + Description

  15:             }

  16:             return nameAndDescription;

  17:         } 

  18:     }

  19: }

Now it is OK and it is completely usable.

Using TestCase arrtibute of NUnit



By ganton ~ December 7th, 2009. Filed under: Unit testing.

Sometimes ago NUnit extend their framework with a possibility to specify parameters via an attribute and then these parameters are reused by NUnit run-time and the test method is executed several times with different parameters. An example: let’s assume that we have a method that concatenates person first name and person last name.

   1: //…..

   2: public void ConcatenateNames(string firstName, string lastName)

   3: {

   4:     if(string.IsNullOrEmpty(firstName))

   5:         throw new ArgumentNullException("message");

   6:     if(string.IsNullOrEmpty(lastName))

   7:         throw new ArgumentNullException("message");

   8:  

   9:     // concatenate

  10: }

We need to test this method for invalid and valid arguments. If arguments are invalid we should expect an exception, otherwise we should check if it concatenates person’s names correctly. We can easily do this using TestCase attribute and not repeating the same code in several test methods.

   1: [Test]

   2: [TestCase(null, null, ExpectedException = typeof(ArgumentNullException))]

   3: [TestCase(null, "Gochev", ExpectedException = typeof(ArgumentNullException))]

   4: [TestCase("Anton", null, ExpectedException = typeof(ArgumentNullException))]

   5: [TestCase("Anton", "Gochev", ExpectedException = typeof(ArgumentNullException))]

   6: public void ConcatenateNamesTest(string firstName, string lastName)

   7: {

   8:     myClassInstance.ConcatenateNames(firstName, lastName);

   9:  

  10:     // Assert if everything is correct.

  11: }

Nice, isn’t it? TestCase attribute simplifies testing for invalid and valid parameters  and reduces the code that we produce as test methods.

But what if ConcatenateNames method accepts an object Person and should do the same work. In this case we have a small problem. We cannot pass to an attribute something like new Person { FirstName = “”, LastName = “”}. The information of the attribute is stored in class meta-data and consequently attribute’s information is pre-defined but not evaluated during run-time processing. Fortunately, we still can use TestCase attribute but not providing Person object but a bool variable that will allow us to know whether to initialize the object or not.

   1: [Test]

   2: [TestCase(false, ExpectedException = typeof(ArgumentNullException))]

   2: [TestCase(true)]

   3: public void ConcatenateNamesTest(bool initializePerson)

   4: {

   5:     Person person = null;

   6:     if(initializePerson)

   7:         person = new Person { FirstName = "Anton", LastName = "Gochev" };

   8:  

   9:     myClassInstance.ConcatenateNames(person);

  10:  

  11:     // assert everything is correct

  12: }

Well, this example is not a real one but I hope you got the idea. I know that many developers will say that it is not really good practice because you have more logic then just to check methods correctness inside the test but for me it is OK. It saves me a lot of additional test method writing (in more complex cases) and I’m fine with this approach.

Is it possible to check what is allocated space per table, used space per table, space used by indexes and unused space per table in SQL Server 2005?



By ganton ~ November 30th, 2009. Filed under: SQL.

In the samples in this blog I’ll use SQL Server 2005 sample database named “AdventureWorks”.

Yes, it is possible. There is a stored procedure (SP) in SQL Server that allows us to see what is the used space, unused space and used space by indexes for a specific table. The name of this SP is sp_spaceused. According to documentation this procedure will return to us

the number of rows, disk space reserved, and disk space used by a table, indexed view, or Service Broker queue in the current database, or displays the disk space reserved and used by the whole database

It can be used to get db size information but in this sample we will use it to get a disk space used/allocated per table. If we execute the SP against this database for table store

   1: exec sp_spaceused ’sales.store’;

we will get the following result.

image

The picture above shows the information that we can receive for this specific table.

It looks nice but sometimes we need to have this information for all tables that a specific database contains. In such a case we can use a SP that will allow us to execute a query for all tables. This SP is called “sp_msforeachtable”. Unfortunately, this SP is not documented in msdn (or I wasn’t able to find anything about it) but you can find it in Programmability\Stored Procedures\System Stored Procedures of selected DB. Then you can right click on it (using SQL Server Management Studio) and to select modify. You can examine the body of this SP for more information. For our example we need to know that it will execute provided query against all tables in DB and will return one or more rows per each table (in our case one row). Note that we specify ‘?’ instead of table name. “sp_msforeachtable” SP will provide each table name as a parameter to our query. Let’s see.

   1: exec sp_msforeachtable @command1=

   2:          "exec sp_spaceused ‘?’";

Executing the SP we will get a result in SQL Server Management Studio like that below. The result contain part of the table results.

image 

Well, this result is not really friendly and I do not think it is easy to be used in an application. What we can do to make the result more friendly? We can return it in one table. For this purpose we will create a temp table and will insert the result into it and then we will select stored data. We also need to check if the temp table already exists and to drop it (in our case). It is useful when you want to execute the query many times in SQL Server Management Studio. The sample is below.

   1: if object_id(‘tempdb..#usedSpace’) is not null

   2: begin

   3:     drop table #usedSpace;

   4: end

   5:  

   6: create table #usedSpace

   7: (

   8:    [name]   nvarchar(64),

   9:    [rows]   int,

  10:    [reserved]   nvarchar(64),

  11:    [data]   nvarchar(64),

  12:    [indexSize]   nvarchar(64),

  13:    [unused]   nvarchar(64),

  14: );

  15:  

  16: exec sp_msforeachtable @command1=

  17:          "insert into #usedSpace

  18:           exec sp_spaceused ‘?’";

  19:    

  20: select * from #usedSpace;

And part of the result.

image

Having all data in a temp table we can make some queries that will allow use to refine the output.

{"Server-generated keys and server-generated values are not supported by SQL Server Compact."}



By ganton ~ November 22nd, 2009. Filed under: .Net.

I’m creating a small application for my friends and I decided to use SQL CE and Entity Framework. I’ve configured the database to use identities for primary keys in tables and I got an exception when I’ve tried to add a record.

According to information from internet Entity Framework doesn’t support database generated values. It is not a good news for me and I hope it will be fixed in next version of the framework.

I’ve decided to use a partial class of the Entity class and extend it with additional methods that will allow me to generate record id. I also moved from bigint ids to ids of type uniqueidentifier. Every Entity generated by the designer contains a static method that allows us to create an entity. I have an entity Author and generated create method for it is

   1: public partial class Author : global::System.Data.Objects.DataClasses.EntityObject

   2:     {

   3:         /// <summary>

   4:         /// Create a new Author object.

   5:         /// </summary>

   6:         /// <param name="firstName">Initial value of FirstName.</param>

   7:         /// <param name="lastName">Initial value of LastName.</param>

   8:         /// <param name="id">Initial value of Id.</param>

   9:         public static Author CreateAuthor(string firstName, string lastName, global::System.Guid id)

  10:         {

  11:             Author author = new Author();

  12:             author.FirstName = firstName;

  13:             author.LastName = lastName;

  14:             author.Id = id;

  15:             return author;

  16:         }

  17:

  18: }

My partial class for Author is like that one below.

   1: public partial class Author

   2: {

   3:     public static Author CreateAuthor(string firstName, string lastName)

   4:     {

   5:         Author author = new Author();

   6:         author.FirstName = firstName;

   7:         author.LastName = lastName;

   8:         author.Id = Guid.NewGuid();

   9:         return author;

  10:     }

  11:  

  12:     public static Author CreateAuthor()

  13:     {

  14:         Author author = new Author();

  15:         author.Id = Guid.NewGuid();

  16:         return author;

  17:     }

  18: }

Than I can use this methods like Author newAuthor = Author.CreateAuthor();

The real downside of this technique is that if you have a database with a lot of entities you should create for all of them. In my case it works fine. Application database contains just several tables.

XmlConverter performance improvement



By ganton ~ August 6th, 2009. Filed under: .Net.

Last year I wrote a post about how to serialize and desterilize objects to/from XML using standard .Net Framework classes (see this post).  Yesterday, I need to use the class but I need to improve the performance. When one creates XmlSerializer class many time is spend to configure it behind the scene. After its initialization serialization and deserialization is much faster. I’ve decided to store already configured XmlSerializer and to reuse it. This will improve the performance significantly. For this purpose I extended the XmlConverter class from my old post. XmlConverter stores in a static hash table each created XmlSerializer and then reuse it. If it is not required to store it it will not store it. Now XmlSerializer contains two new methods FromXml and ToXml that accept a parameter named ‘useStaticSerializer’ which tells the XmlConverter to store and reuse XmlSerializer or not. Below is the new code of the XmlConverter.

   1: using System; 

   2: using System.Collections.Generic; 

   3: using System.Linq; 

   4: using System.Text; 

   5: using System.IO; 

   6: using System.Xml.Serialization; 

   7: using System.Xml; 

   8:  

   9: public static class XmlConverter 

  10: {

  11:     private static Dictionary<Type, XmlSerializer> serializers = new Dictionary<Type, XmlSerializer>();

  12:  

  13:     private static XmlSerializer GetSerializer(bool useStaticSerializer, Type serializerType)

  14:     {

  15:         XmlSerializer serializer = null;

  16:         if (useStaticSerializer)

  17:         {

  18:             lock (serializers)

  19:             {

  20:                 if (!serializers.TryGetValue(serializerType, out serializer))

  21:                 {

  22:                     serializer = new XmlSerializer(serializerType);

  23:                     serializers.Add(serializerType, serializer);

  24:                 } 

  25:             }

  26:         }

  27:         else

  28:         {

  29:             serializer = new XmlSerializer(serializerType);

  30:         }

  31:         return serializer;

  32:     }

  33:  

  34:     public static string ToXml<T>(T value) where T : new() 

  35:     { 

  36:        return ToXml<T>(value, Encoding.UTF8, false); 

  37:     } 

  38:  

  39:     public static string ToXml<T>(T value, Encoding encoding) where T : new() 

  40:     { 

  41:        return ToXml<T>(value, Encoding.UTF8, false);

  42:     }

  43:  

  44:     public static string ToXml<T>(T value, Encoding encoding, bool useStaticSerializer) where T : new()

  45:     {

  46:         try

  47:         {

  48:             XmlSerializer serializer = GetSerializer(useStaticSerializer, typeof(T));

  49:             using (MemoryStream stream = new MemoryStream())

  50:             {

  51:                 using (TextWriter writer = new StreamWriter(stream, encoding))

  52:                 {

  53:                     serializer.Serialize(writer, value);

  54:                     int cnt = (int)stream.Length;

  55:                     byte[] arr = new byte[cnt];

  56:                     stream.Seek(0, SeekOrigin.Begin);

  57:                     stream.Read(arr, 0, cnt);

  58:                     return encoding.GetString(arr, 0, arr.Length).Trim();

  59:                 }

  60:             }

  61:         }

  62:         catch { }

  63:         return null;

  64:     }

  65:  

  66:     public static T FromXml<T>(string xml) where T : new() 

  67:     {            

  68:        return FromXml<T>(xml, false); 

  69:     }

  70:  

  71:     public static T FromXml<T>(string xml, bool useStaticSerializer) where T : new()

  72:     {

  73:         try

  74:         {

  75:             XmlSerializer serializer = GetSerializer(useStaticSerializer, typeof(T));

  76:             using (StringReader stream = new StringReader(xml))

  77:             {

  78:                 using (XmlTextReader reader = new XmlTextReader(stream))

  79:                 {

  80:                     return (T)serializer.Deserialize(reader);

  81:                 }

  82:             }

  83:         }

  84:         catch { }

  85:         return default(T);

  86:     }

  87:  

  88:     public static void ClearStaticSerializers()

  89:     {

  90:         lock (serializers)

  91:         {

  92:             serializers.Clear(); 

  93:         }

  94:     }

  95:  

  96: }