Tuesday, February 19, 2013

Fix for Entity Framework poor INSERT performance with SQL Server Compact and server generated keys

In this blog post I will describe the steps I took in order to find out why the title above was the case, and how it could be fixed.

On Stackoverflow the general opinion was that the reported slowness was “by design” and could not be fixed, but looking at recent tests posted on Stackoverflow pointed to the fact that something was not done right.

Since Entity Framework is now Open Source and available on CodePlex, I decided to have a deeper look.

To test if the process could be improved, I created the following console app:

 

   1:  namespace EF6SqlCETest
   2:  {
   3:      using System;
   4:      using System.Data.Entity;
   5:      using System.Diagnostics;
   6:   
   7:      class Program
   8:      {
   9:          static void Main(string[] args)
  10:          {
  11:              using (var db = new StudentContext())
  12:              {
  13:                  Stopwatch sw = new Stopwatch();
  14:                  db.Database.Delete();
  15:                  sw.Start();
  16:                  db.Database.CreateIfNotExists();
  17:                  db.Configuration.AutoDetectChangesEnabled = false;
  18:                  db.Configuration.ProxyCreationEnabled = false;
  19:                  Console.WriteLine(
  20:                      "Db created in {0}", sw.Elapsed.ToString());
  21:                  sw.Restart();
  22:                  for (int i = 0; i < 4000; i++)
  23:                  {
  24:                      var student = new Student { Name = Guid.NewGuid().ToString() };
  25:                      db.Students.Add(student);
  26:                  }
  27:                  Console.WriteLine(
  28:                      "Entities added in {0}", sw.Elapsed.ToString());
  29:   
  30:                  sw.Restart();
  31:                  int recordsAffected = db.SaveChanges();
  32:                  sw.Stop();
  33:                  Console.WriteLine(
  34:                      "Saved {0} entities to the database, press any key to exit.",
  35:                      recordsAffected);
  36:                  Console.WriteLine(
  37:                      "Saved entities in {0}", sw.Elapsed.ToString());
  38:                  Console.ReadKey();
  39:              }
  40:   
  41:          }
  42:      }
  43:   
  44:      public class Student 
  45:      {
  46:          public int Id { get; set; }        
  47:          public string Name { get; set; }
  48:      }
  49:   
  50:      public class StudentContext : DbContext
  51:      {
  52:          public DbSet<Student> Students { get; set; }
  53:      }
  54:   
  55:  }






The test project and the related app.config is available for download here: http://sdrv.ms/UCL2j5



The test code is a simple Code First DbContext model. For each run I start with a new blank database, and creates it before doing SaveChanges, so that part of the process can be timed individually. The 2 options on lines 17 and 18 are there to ensure that the for loop runs quickly, without these option the loop adding objects takes much longer (test for yourself).



The resulting table looks like this:

CREATE TABLE [Students] (
[Id] int NOT NULL IDENTITY (1,1)
, [Name] nvarchar(4000) NULL
);
GO
ALTER TABLE [Students] ADD CONSTRAINT [PK_dbo.Students] PRIMARY KEY ([Id]);
GO






In order to find out where time was spent during SaveChanges, I ran a Visual Studio Performance Analysis. It turned out that all the time was spent in sqlceqp40.dll, the SQL Server Compact 4.0 unmanaged query processor – so something was amiss.



As described in my earlier blogpost, the SQL statements generated in order to return the server generated id (the IDENTITY value), looked like the following:



SELECT [Id] FROM [Students] WHERE [Id] = @@IDENTITY



So using the SQL Server Compact Toolbox, I coulde analyze the 2 statements:



image



And got the following result:



image



So for every INSERT, a table scan was performed, as for some reason, the SQL Server Compact query processor could not figure out to do an Index Seek. And the more rows to be scanned, the worse the performance got. And all the time for the operation was spent doing this.



In order to avoid this, I decided that the goal of the statement executed should be to avoid table scans, but return a value with the exact same shape as the previous statement executed, that is; it should have the name of the IDENTITY column, and be of the correct type (only bigint and int are supported as IDENTITY types with SQL Server Compact).



The return value of @@IDENTITY is numeric, so simply using “SELECT @@IDENTITY AS [Id]” would not work. So the statement could be:



SELECT [Id] FROM [Students] WHERE [Id] = CAST(@@IDENTITY as int)



The type could then be either int or bigint and the column alias should of course be the correct column name.



I could then analyze the modified statement:

image

 


image


And no table scan, instead an index seek, consuming only 19% of the total query cost instead of 100%.

And so this is what I have implemented in my fix, that I now need to figure out how to “submit a pull request” for.

8 comments:

Unknown said...

Thanks for your good work! How will we know if and when the EF-team will act on this problem?

ErikEJ said...

Andre: I think the team will take the fix seriously, but I just need to submit it, and then they will revei/polish and include in the master branch - after taht it will be available in the source and all future builds.

Alexander Kohl said...
This comment has been removed by the author.
ErikEJ said...

Alex: Use the Toolbox file version checker to determine the version, and I will tell you how to proceed

Unknown said...

This is also a problem in LINQ-to-SQL.

Note that in L2S (and maybe also EF?), the query to return the server generated ID will also potentially include other columns, such as a rowversion column, e.g:

SELECT [Id], [RowVersion] FROM [Student] WHERE [Id] = @@IDENTITY

As such, I'm not sure that simply doing "SELECT CAST(@@IDENTITY AS int) AS [Id]" is sufficient.

However, the query optimizer *will* do the right thing if you cast the @@IDENTITY variable to int before doing the select:

SELECT [Id], [RowVersion] FROM [Student] WHERE [Id] = CAST(@@IDENTITY AS INT)

I think the 'correct' place to fix this would be in SQL CE itself, I'd see this as being a issue with the CE query optimizer rather than the queries generated by EF/L2S.

Unfortunately I can't see any obvious way to bodge L2S to do the right thing...

ErikEJ said...

Anthony: Thanks, that was very useful information, I will do some testing with rowversion in my model. The only place this can be fixed as I see it is in the EF provider, as SQL Compact and LINQ to SQL are both eol.

ErikEJ said...

Anthony: Thanks very much, your proposal is the "correct" fix. And LINQ to SQL suffers from the same issue, yes, but has much less overhead when it comes to database ceation and adding objects (both less than 500 ms)

Alexander Kohl said...

Thanks Eric, got it working. Your blog was invaluable for that.