0

Say I have hundreds of thousands of records in a text file which I'd like to insert into the database every day. Of which around half of them already exist within the database. Also an unique row is defined using say 6 columns.

What is the correct way to code the insert in .NET in this particular case? The two which I'm wondering over are:

Do I SQL-insert straight away and catch the SQLException for duplicate entries? In this case, I'd be breaking the concept that Exceptions should be used only for exceptional cases and not for the frequent cases.

or

Do I do a SQL-select first to check for the row before I do an insert? In this case, it'd seem that the database will do the insert and check for the uniqueness a second time automatically despite having just completed a select.

1
  • What are you using, ado.net/ef/stored procedure/inline sql? Commented Feb 16, 2013 at 10:54

3 Answers 3

1

Use a sql statement that checks for the row before inserting it. Here is a simple example for a table called person with two columns, forename and surname which are checked for uniqueness:

/// <summary>
/// Insert a row into the person table
/// </summary>
/// <param name="connection">An open sql connection</param>
/// <param name="forename">The forename which will be inserted</param>
/// <param name="surname">The surname which will be inserted</param>
/// <returns>True if a new row was added, False otherwise</returns>
public static bool InsertPerson(SqlConnection connection, string forename, string surname)
{
    using (SqlCommand command = connection.CreateCommand())
    {
        command.CommandText =
            @"Insert into person (forename, surname)
                Select @forename, @surname
                Where not exists 
                    (
                        select 'X' 
                        from person 
                        where 
                            forename = @forename 
                            and surname=@surname
                    )";
        command.Parameters.AddWithValue("@forename", forename);
        command.Parameters.AddWithValue("@surname", surname);

        int rowsInserted = command.ExecuteNonQuery();

        // rowsInserted will be 0 if the row is already in the database
        return rowsInserted == 1;
    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

You don't want to open a connection for each insert.
The code sample is the simplest thing that will work. Plenty of optimisations are possible; my aim was to concisely demonstrate all the required concepts so that anyone who looks at this answer will be able to make use of it.
I've amended the code sample so that it takes an open connection instead of creating an opening one as suggested by CodeCaster
0

I think you should chose exception way. Just do something like that:

foreach(var elem in elemntsFromFile)
{
    try
    {
       context.sometable.Add(elem);
       context.SaveChanges();
    }
    catch
    {
    }
}

One moment. I dodnt like that db.saveChanges runs in every iteration, but it will on 100% will have better performance then "the way of select-first". It will work and work as well.

Comments

0

A simple way to ignore the duplicates is to create your unique index with option IGNORE_DUP_KEY=ON. You won't then incur the overhead of testing for duplicates or catching exceptions.

e.g.

CREATE UNIQUE NONCLUSTERED INDEX [IX_IgnoreDuplicates] ON [dbo].[Test]
(
    [Id] ASC,
    [Col1] ASC,
    [Col2] ASC
)
WITH (IGNORE_DUP_KEY = ON) 

Also you can then use BULK INSERT to efficiently load all of your data with automatic duplicate removal.

See CREATE INDEX

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.