0

I am trying to create an xml by looping through List<TestRequest>. for better performance , i am trying Parallel.ForEach for looping as there are thousands of records in the list however i am not getting consistent data in xml, sometime there is a truncation in xml string while appending to string builder and sometimes there is data mismatch. below is the code

   public class Program
    {


        static void Main(string[] args)
        {
            List<TestRequest> ids = new List<TestRequest>();
            Random rnd = new Random();
            int id = rnd.Next(1, 12345);
            for (int i = 1; i < 1000; i++)
            {
                var data = new TestRequest();
                data.dataId = id;
                ids.Add(data);
            }

            var xmlData = GetIdsinXML(ids);

        }

        private static string GetIdsinXML(List<TestRequest> Ids)
        {
            var sb = new StringBuilder();
            sb.Append("<ROOT>");

            Parallel.ForEach(Ids, id =>
            {

                sb.Append("<Row");
                sb.Append(" ID='" + id.dataId + "'");
                sb.Append("></Row>");


            }
            );


            sb.Append("</ROOT>");
            return sb.ToString();
        }


    }

    public class TestRequest
    {
        public int dataId { get; set; }

    }

is this is the correct way of using Parallel.ForEach ?

Please help. Thanks!

11
  • 3
    You can't use Parallel.ForEach like that. StringBuilder is not thread-safe. Commented Jan 22, 2019 at 11:04
  • 1
    Not only is StringBuilder not thread-safe, but you have 3 sb.Append invocations per task, which is not atomic either... Commented Jan 22, 2019 at 11:08
  • 3
    Why are you using Parallel.ForEach in this case? Why not use .NET's serialization mechanisms? What you try here, apart from not working, will keep the entire XML string in memory and generate a temporary string for each row due to that concatenation. .NET's libraries write directly to streams and files instead Commented Jan 22, 2019 at 11:09
  • 1
    Its actually possible with Parallel foreach. you should use different string builders and merge them finally. Commented Jan 22, 2019 at 11:12
  • 2
    @Harshit thousands of records is no data at all. Why do you need to improve performance? What isn't running fast enough? There's a reason serialization is called that way anyway. You can't write a text file at random, you need to write its elements in order. Have you tried .NET's serializers? LINQ to XML? Some other serializer? Commented Jan 22, 2019 at 11:14

2 Answers 2

2

Here is the easiest way to do what you want in a parallel way:

public class TestRequest
{
    public int dataId { get; set; }
    public string ToXml() => $"<row id=\"{dataId}\"/>";
}

class Program
{
    static void Main(string[] args)
    {
        const int n = 10000000;
        List<TestRequest> ids = new List<TestRequest>();
        Random rnd = new Random();
        for (int i = 1; i<n; i++)
        {
            var data = new TestRequest
            {
                dataId=rnd.Next(1, 12345)
            };
            ids.Add(data);
        }
        Stopwatch sw = Stopwatch.StartNew();
        var xml = GetIdsinXML(ids);
        sw.Stop();
        double time = sw.Elapsed.TotalSeconds;

        File.WriteAllText("result.xml", xml);
        Process.Start("result.xml");

        var output = $"Size={n} items, Time={time} sec, Speed={n/time/1000000} M/sec";

#if DEBUG
        Debug.WriteLine(output);
#else
        Console.WriteLine(output);
#endif

    }

    static string GetIdsinXML(List<TestRequest> requests)
    {
        // parallel query
        var list = requests.AsParallel().Select((item) => item.ToXml());
        // or sequential below:
        // var list = requests.Select((item) => item.ToXml());

        return $"<root>\r\n{string.Join("\r\n", list)}\r\n</root>";
    }
}

On my crappy computer, without the .AsParallel() statement, executing sequentially I get about 1,600,000 operations per second. With the parallel statement, this jumps to 2,100,000 operations per second.

NOTE: I have replaced SpringBuilder with the built-in method string.Join(string, IEnumerable list) which should be fairly optimized already by Microsoft. As an interesting side note, the Debug build is as fast if not even faster than the Release build. Go figure.

Sign up to request clarification or add additional context in comments.

2 Comments

totally overlooked that one. clever solution and much faster. interestingly on my rig with default degree of parallelism (8 threads) I get 2.9 seconds, with 2 degree of parallelism i get 1.3 seconds! without parallel i get 2.4 seconds.
@M.kazemAkhgary - FYI, I am getting 6 and 4.5 seconds on a 4 core machine. The slower performance with higher parallelism is due to overhead, and that some of the cores must be busy doing other work. For this long and fairly fast calculation, it seems the most efficient is using only 2 cores. Each problem is different.
1

With PLINQ you can do something like this.

I'm not sure if you would really need this. I just put it for sake of answer.

Your code doesn't work because StringBuilder is not thread safe and your operations are not atomic which means your code has race condition.

sb.Append("<Row");
sb.Append(" ID='" + id.dataId + "'");
sb.Append("></Row>");

For example one thread may execute line 1, the other thread just after that executes line 3. you will have <Row></Row>. this race condition happens all the time and final result is gibberish.

One way to fix this is to use different StringBuilders on different threads and finally append the result of those builders sequentially.

If running threads do very light tasks and finish quickly, doing things in parallel will only slow down your program.

return Ids.AsParallel()
   .Select((id, index) => (id, index))
   .GroupBy(x => x.index%Environment.ProcessorCount, x => x.id, (k, g) => g)
   .Select(g => 
   {
       var sb = new StringBuilder();
       foreach (var id in g)
       {
           sb.Append("<Row");
           sb.Append(" ID='" + id.dataId + "'");
           sb.Append("></Row>");
       }
       return sb.ToString();
   })
   .AsSequential()
   .Aggregate(new StringBuilder("<ROOT>"), (a, b) => a.Append(b))
   .Append("</ROOT>").ToString();

Measure the performance and see if it really does improve or not. if it doesn't don't do it in parallel.

2 Comments

AsSequential seems like an odd thing to use alongside AsParallel. You are sure it will run in parallel?
It does run in parallel what ever comes before AsSequential. What ever comes after it is sequential. I haven't checked source code yet but i confirmed it by putting breakpoint inside and its hit by multiple threads. @mjwills

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.