1

We have an object (XML or JSON) and we map it to a DTO successfully, it takes too long (5~7 minutes) to be inserted in our database, so we went through Parallel.ForEach, but eventually, we noticed that there are some data entered incorrectly, like the Category has all items with the same name, but other different properties are 100% correct, in other case, we noticed that all data are the same in one category, although, the provided JSON object doesn't have that.

I confess it is so fast, it takes less than a minute, but with wrong insertion, have a look below on the used code:

JSON

[
  {
    "CategoryId": 1,
    "CategoryName": "Drinks",
    "SortOrder": 1,
    "Products": [
      {
        "ProductId": 100,
        "ProductName": "Black Tea",
        "SortOrder": 1,
        "Price": 5,
        "Choices": []
      },
      {
        "ProductId": 101,
        "ProductName": "Turkish Coffee",
        "SortOrder": 2,
        "Price": 7.5,
        "Choices": []
      },
      {
        "ProductId": 102,
        "ProductName": "Green Tea",
        "SortOrder": 3,
        "Price": 6,
        "Choices": []
      },
      {
        "ProductId": 103,
        "ProductName": "Café Latte Medium",
        "SortOrder": 4,
        "Price": 10,
        "Choices": []
      },
      {
        "ProductId": 104,
        "ProductName": "Orange Juice",
        "SortOrder": 5,
        "Price": 11,
        "Choices": []
      },
      {
        "ProductId": 105,
        "ProductName": "Mixed Berry Juice",
        "SortOrder": 6,
        "Price": 12.5,
        "Choices": []
      }
    ]
  },
  {
    "CategoryId": 1,
    "CategoryName": "Meals",
    "SortOrder": 1,
    "Products": [
      {
        "ProductId": 200,
        "ProductName": "Breakfast Meal",
        "SortOrder": 1,
        "Price": 16,
        "Choices": [
          {
            "ChoiceId": 3000,
            "ChoiceName": "Strawberry Jam",
            "SortOrder": 1,
            "Price": 0
          },
          {
            "ChoiceId": 3001,
            "ChoiceName": "Apricot Jam",
            "SortOrder": 2,
            "Price": 0
          },
          {
            "ChoiceId": 3002,
            "ChoiceName": "Orange Jam",
            "SortOrder": 3,
            "Price": 0
          },
          {
            "ChoiceId": 3003,
            "ChoiceName": "Café Latte",
            "SortOrder": 4,
            "Price": 2
          }
        ]
      },
      {
        "ProductId": 201,
        "ProductName": "Mixed Grill",
        "SortOrder": 1,
        "Price": 30,
        "Choices": [
          {
            "ChoiceId": 3004,
            "ChoiceName": "Moutabal",
            "SortOrder": 1,
            "Price": 0
          },
          {
            "ChoiceId": 3005,
            "ChoiceName": "Mineral Water",
            "SortOrder": 2,
            "Price": 0
          },
          {
            "ChoiceId": 3006,
            "ChoiceName": "French Fries",
            "SortOrder": 2,
            "Price": 0
          },
          {
            "ChoiceId": 3007,
            "ChoiceName": "Grilled Potatoes",
            "SortOrder": 2,
            "Price": 0
          }
        ]
      }
    ]
  }
]

C# code

Parallel.ForEach(categories, (category) =>
{
    var newCreatedCategoryId = 0;
    using (var connection = new SqlConnection("CONNECTION_STRING_HERE"))
    {
        connection.Open();
        using (var command = new SqlCommand("SP_INSERT_INTO_CATEGORIES", connection))
        {
            command.CommandType = CommandType.StoredProcedure;
            command.Parameters.AddWithValue("@P1", category.CategoryName);
            command.Parameters.AddWithValue("@P2", category.SortOrder);
            newCreatedCategoryId = int.Parse(command.ExecuteScalar().ToString());
            command.Dispose();
        }

        connection.Close();
    }

    if (newCreatedCategoryId > 0)
    {
        Parallel.ForEach(category.Products, (product) =>
        {
            using (var connection = new SqlConnection("CONNECTION_STRING_HERE"))
            {
                connection.Open();
                using (var command = new SqlCommand("SP_INSERT_INTO_PRODUCTS", connection))
                {
                    command.CommandType = CommandType.StoredProcedure;
                    command.Parameters.AddWithValue("@P1", product.ProductName);
                    command.Parameters.AddWithValue("@P2", product.Price);
                    command.Parameters.AddWithValue("@P3", product.SortOrder);
                    command.Parameters.AddWithValue("@P4", newCreatedCategoryId);
                    command.ExecuteNonQuery();
                    command.Dispose();
                }

                connection.Close();
            }
        });
    }
});

I had a look here, but this is not our issue, we are already using SCOPE_IDENTITY() to get the last generated identity in the current scope of execution.

On the other hand, it is not allowed to use SqlBulkCopy to insert this amount of data even if with no TableLock.

10
  • Can you post the actual code? The reason I ask is that normally you have to specify CommandType as StoredProcedure on SqlCommand to pass parameters, which leads me to think this isn't your actual code. Commented Jun 27, 2018 at 8:02
  • How big is your json? how many lines are you generating? Commented Jun 27, 2018 at 8:03
  • By the way, you may find it faster to use "table valued parameters". Commented Jun 27, 2018 at 8:08
  • 2
    And this kids, is why you don't let people play with dbs and parallel foreach :) seriously though, this is not thread safe Commented Jun 27, 2018 at 8:09
  • 1
    Yes, in this case the DB server becomes the bottleneck, because it might (note that this depends on the server settings) just block any excess calls. If it only allows 16 parallel connections from one client and you bombard it with 32, 64 or even more the ones above the limit are block anyway. Anyway threadsafety is the main issue here. Get it fixed first. Commented Jun 27, 2018 at 11:19

3 Answers 3

5

Its the newCreatedCategoryId that is the problem, what is confusing me is why you are calling newCreatedCategoryId = int.Parse(command.ExecuteScalar().ToString()); again in the inner loop. i mean if its just an id of category it doesn't need to be incremented again.

Take a look at the below edit. You might also be better to just put the second Parallel.ForEach into a standard foreach i mean this is all working in parallel anyway. Lastly Parallel.ForEach is not really suited to IO tasks, the correct pattern is async and await. on saying that you could probably use an ActionBlock out of TPL Dataflow to take advantage of the best of both worlds. Take a look at the dataflow example in the this question i answered Downloading 1,000+ files fast?

Parallel.ForEach(categories, (category) =>
{
    var newCreatedCategoryId = 0;
    using (var connection = new SqlConnection("CONNECTION_STRING_HERE"))
    {
        connection.Open();
        using (var command = new SqlCommand("SP_INSERT_INTO_CATEGORIES", connection))
        {
            command.CommandType = CommandType.StoredProcedure;
            command.Parameters.AddWithValue("@P1", category.CategoryName);
            command.Parameters.AddWithValue("@P2", category.SortOrder);
            newCreatedCategoryId = int.Parse(command.ExecuteScalar().ToString());
            command.Dispose();
        }

        connection.Close();
    }

    if (newCreatedCategoryId > 0)
    {
        foreach(product in category.Products)
        {
            using (var connection = new SqlConnection("CONNECTION_STRING_HERE"))
            {
                connection.Open();
                using (var command = new SqlCommand("SP_INSERT_INTO_PRODUCTS", connection))
                {
                    command.CommandType = CommandType.StoredProcedure;
                    command.Parameters.AddWithValue("@P1", product.ProductName);
                    command.Parameters.AddWithValue("@P2", product.Price);
                    command.Parameters.AddWithValue("@P3", product.SortOrder);
                    command.Parameters.AddWithValue("@P4", newCreatedCategoryId);
                    command.Dispose();
                }

                connection.Close();
            }
        }//);
    }
});
Sign up to request clarification or add additional context in comments.

6 Comments

For the code, it was a copy/past error from my side, I just call command.ExecuteNonQuery() there, but is it an issue to access newCreatedCategoryId from the nested Parallel.ForEach ??
@RahmaAbdulhameed there is no real thread problem if that was just a copy and paste error, ie its all self contained and should work
I really appreciate your help, but what is the issue from your point of view? or what can I try from my side to test it once again
@RahmaAbdulhameed try the changes i made, does it still happen?
I can see that you have converted the nested Parallel.ForEach to sequential foreach, and I which I could have nested Parallel.ForEach, but I will give it a try multiple times to see if I get the same issue or not ... but, may you clarify what is the purpose of that conversion? would it help in solving this issue? in other works, was it the cause?
|
1

The objects that you are looping over are not thread-safe. You could add a lock object however this would serialise the operation and defeat the purpose of the Parallel.Foreach. You need to change theParallel.ForEach to a standard ForEach loop.

Potential Pitfalls in Data and Task Parallelism

Comments

0

You change the newCreatedCategoryId inside the Parallel.ForEach, which may cause incorrect data, because the queries wont run in order.

1 Comment

this was an copy/past error, I edited it here as in my app, I don't do that

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.