5

I have some c# code that runs fine on a webserver. The code uses async/await because it performs some network calls in production environment.

I also need to run some simulations on the code; the code gets called billions of times concurrently during the simulation. The simulations doesn't perform any network call: a mock is used which returns a value using Task.FromResult(). The values returned from the mock actually simulate every possible response from the network call that can be received in production environment.

I undestand there is some overhead using async/await, but I also expect that there should be not a huge difference in performance given that an already-completed task is returned and there should be no actual waiting.

But making some tests I noticed a big drop in performance (expecially on some hardware).

I tested the following code using LinqPad with compiler optimization turned on; you can remove the .Dump() call and paste the code in a console application if you want to test it directly in visual studio.

// SYNC VERSION

void Main()
{
    Enumerable.Range(0, 1_000_000_000)
        .AsParallel()
        .Aggregate(
            () => 0.0,
            (a, i) => Calc(a, i),
            (a1, a2) => a1 + a2,
            f => f
        )
        .Dump();
}

double Calc(double a, double i) => a + Math.Sin(i);

and

// ASYNC-AWAIT VERSION

void Main()
{
    Enumerable.Range(0, 1_000_000_000)
        .AsParallel()
        .Aggregate(
            () => 0.0,
            (a, i) => Calc(a, i).Result,
            (a1, a2) => a1 + a2,
            f => f
        )
        .Dump();
}


async Task<double> Calc(double a, double i) => a + Math.Sin(i);

The async-await version of the code exemplifies the situation of my simulation code.

I runs the simulations quite successfully on my i7 machine. But I get some very bad result when I try to run the code on a AMD ThreadRipper machine we have in our office.

I've run some benchmarks using the code above in linq pad both on my i7 machine and the AMD ThreadRipper and these are the results:

TEST on i7 quad-core 3,67 Ghz (windows 10 pro x64):

sync version: 15 sec (100% CPU)
async-await version: 20 sec (93% CPU)
TEST on AMD 32 cores 3,00 Ghz (windows server 2019 x64):

sync version: 16 sec (50% CPU)
async-await version: 140 sec (14% CPU)

I understand there are hardware differences (maybe the Intel hyperthreading is better, etc), but this question is not about the hardware performance.

Why there is not always 100% CPU usage (or 50% taking into account the worse case for CPU hyperthreading), but there is a drop in CPU usage in the async-await version of the code?

(the drop in CPU usage is sharper on the AMD but it's also present on the Intel)

Is there any workaround which doesn't involve the refactoring of all the async-await chain of calls all around the code? (the code base is big and complicated)

Thank you.

EDIT

As suggested in a comment I tried to use ValueTask insted of Task and it seems to solve the issue. I tried this directly in VS because I needed a nuget package (Release build) and these are the results:

TEST on i7

"sync" version: 16 sec (100% CPU)
"await Task" version: 49 sec (95% CPU)
"await ValueTask" version: 31 sec (100% CPU)

and

TEST on AMD

"sync" version: 15 sec (50% CPU)
"await Task" version: 125 sec (12% CPU)
"await ValueTask" version: 17 sec (50% CPU)

Honestly I don't know much about the ValueTask class and I'm going to study it. If you can explain/elaborate in an answer it is welcome.

Thank you.

8
  • I'm assuming that this is a heavily simplified example, but does the method need to be async? Commented Aug 23, 2019 at 8:11
  • In the real code there's a long and complicated chain of async-await method calls. At the end of the chain there is a method which makes a network call (and this is the reason for the use of async/await in the real code). Commented Aug 23, 2019 at 8:21
  • 2
    Could you change the signature of your method from async Task<double> Calc(double a, double i) to async ValueTask<double> Calc(double a, double i), and run your benchmarks again to see if it makes any difference? Commented Aug 23, 2019 at 8:22
  • 2
    I'd bet on memory allocations, that's the only sensible difference between the two versions. Can you try activating server GC and see if it makes a difference? Commented Aug 23, 2019 at 8:36
  • 1
    "ASYNC-AWAIT VERSION", flagging a method as async doesn't magically make it async. It adds a layer of complexity due to the state machine being added but your code is actually just synchronous so you just add complexity for no gain. Additionally, .Result will block the thread, which is also non-async. So what you have here is that you compare a synchronous version of the code to a badly implemented sync-through-async version of your code. Commented Aug 23, 2019 at 8:57

2 Answers 2

4

Your garbage collector is most probably configured to workstation mode (the default), which uses a single thread to reclaim the memory allocated by unused objects. For a machine with 32 cores, one core will certainly not be enough to clean up the mess that the rest 31 cores are constantly producing! So you should probably switch to server mode:

<configuration>
  <runtime>
    <gcServer enabled="true"></gcServer>
  </runtime>
</configuration>

Background server garbage collection uses multiple threads, typically a dedicated thread for each logical processor.

By using ValueTasks instead of Tasks you avoid memory allocations in the heap because the ValueTask is a struct that is allocated in the stack and has no need for garbage collection. But this is the case only if it wraps the result of a completed task. If it wraps an incomplete task then it offers no advantage. It is suitable for cases where you have to await tens of millions of tasks, and you expect that the vast majority of them will be completed.

Sign up to request clarification or add additional context in comments.

2 Comments

I tried the server mode, but it didn't make too much of a difference, because my code is very optimized for memory allocation (almost at byte level). So, at the end, I think someway the memory allocation of Task objects was slowing down the code (a lot of them, because the call tree of async methods is quite long in the real code). I will refactor my code using ValueTask since it's suitable in my situation. Unfortunatly I will need to update a lot of assembly and some of them has been already certified by external test labs, so the certification process will have to be carried out again : (
This is an old quetion, but maybe someone maybe interested in how to fix that situation on a AMD threadripper: <configuration> <runtime> <Thread_UseAllCpuGroups enabled="true"/> <GCCpuGroup enabled="true"/> <gcServer enabled="true"/> </runtime> </configuration> The CPU has multiple groups, and they need to be enabled in config. (using ValueTask instead of Task togheter with this settings brings the CPU usage to 100%)
3

I'd like to address this:

The async-await version of the code exemplifies the situation of my production code.

You said that your production version "performs some network calls". If that's the case, then the code you have shown here does not exemplify your production code. The reason was mentioned by Lasse in the comments: Your async method is not running asynchronously. The reason is in how await works.

The await keyword looks at the Task returned by the method you're calling. You know that it will pause execution of the method and sign up the rest of the method as a continuation of the Task. But what you may not know is that this only happens if the Task has not completed yet. If the Task is already complete when await looks at it, then your code proceeds synchronously. In fact, you should be seeing a compiler warning telling you this:

CS1998: This async method lacks 'await' operators and will run synchronously. Consider using the 'await' operator to await non-blocking API calls, or 'await Task.Run(...)' to do CPU-bound work on a background thread.

Because of that, the only difference between your two code blocks is that your async version just adds the unnecessary overhead of await to still run synchronously.

To have a truly asynchronous method, you actually have to do something that needs to be waited on. If you want to simulate this, you can use Task.Delay. Even if you use the smallest delay you can possibly have (Task.Delay(TimeSpan.FromTicks(1))), it will still trigger await to do it's work.

async Task<double> Calc(double a, double i)
{
    await Task.Delay(TimeSpan.FromTicks(1));
    return a + Math.Sin(i);
}

That, of course, introduces a delay you didn't have before, so you should compare it with a synchronous version that uses Thread.Sleep for the same duration:

double Calc(double a, double i)
{
    Thread.Sleep(TimeSpan.FromTicks(1));
    return a + Math.Sin(i);
}

On my Intel Core i7, the asynchronous version runs for ~22 seconds, and the synchronous version ~50 seconds.

Normally I would say that all the benefits of asynchronous code gets thrown out the window when you use .Result, but you are using AsParallel()... but I'm still not sure how that would affect the performance.

2 Comments

I expressed myself badly when I said "The async-await version of the code exemplifies the situation of my production code." It actually exemplify my "simulation" code, not my production code. I will edit the question and try to explain it better.
I understand. :) But my answer is still relevant. You're getting the results you did because your simulation that uses the async keyword is running synchronously.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.