14

What is the correct way to use async/await in a recursive method? Here is my method:

public string ProcessStream(string streamPosition)
{
    var stream = GetStream(streamPosition);

    if (stream.Items.count == 0)
        return stream.NextPosition;

    foreach(var item in stream.Items) {
        ProcessItem(item);
    }

    return ProcessStream(stream.NextPosition)
}

And here is the method with async/await:

public async Task<string> ProcessStream(stringstreamPosition)
{
        var stream = GetStream(streamPosition);

        if (stream.Items.count == 0)
            return stream.NextPosition;

        foreach(var item in stream.Items) {
            await ProcessItem(item); //ProcessItem() is now an async method
        }

        return await ProcessStream(stream.NextPosition);
 }
7
  • 1
    Are you seeing any errors with what you tried? Commented Jul 29, 2014 at 3:21
  • 1
    @StephenCleary Nope, seems to be working fine. I was just wondering if there is any potential danger. Commented Jul 29, 2014 at 3:25
  • 1
    @zerkms How is my code synchronous? With the await keyword, the method should return right away to the caller right? I only started learning async/await today, so there's likely a lot of gaps in my understanding. Commented Jul 29, 2014 at 3:30
  • 1
    @Prabhu: yep, it's my mistake Commented Jul 29, 2014 at 3:33
  • 1
    Looks pretty correct to me. Not sure what you're expecting here. Commented Jul 29, 2014 at 4:16

2 Answers 2

14

While I have to say upfront that the intention of the method is not entirely clear to me, reimplementing it with a simple loop is quite trivial:

public async Task<string> ProcessStream(string streamPosition)
{
    while (true)
    {
        var stream = GetStream(streamPosition);

        if (stream.Items.Count == 0)
            return stream.NextPosition;

        foreach (var item in stream.Items)
        {
            await ProcessItem(item); //ProcessItem() is now an async method
        }

        streamPosition = stream.NextPosition;
    }
}

Recursion is not stack-friendly and if you have the option of using a loop, it's something definitely worth looking into in simple synchronous scenarios (where poorly controlled recursion eventually leads to StackOverflowExceptions), as well as asynchronous scenarios, where, I'll be honest, I don't even know what would happen if you push things too far (my VS Test Explorer crashes whenever I try to reproduce known stack overflow scenarios with async methods).

Answers such as Recursion and the await / async Keywords suggest that StackOverflowException is less of a problem with async due to the way the async/await state machine works, but this is not something I have explored much as I tend to avoid recursion whenever possible.

Sign up to request clarification or add additional context in comments.

8 Comments

@Krill awesome thanks. Another issue is that the foreach is going to process one after the other because it is awaiting ProcessItem. Is there anyway to speeden it up so that it is' not waiting for ProcessItem. I sort of need to just fire and forget, but if I take out the await, the compiler gives me a warning that not using the await in an async method is not a good idea.
@Prabhu, that is highly scenario-specific and depends on multiple factors, i.e. how thread-safe your code is, and how many items you expect stream.Items to contain. Thread-safety is paramount, and you need to ensure that multiple ProcessItems can run in parallel. Beyond that - if the number of items is not significant but each takes long time to process, I'd be replacing the foreach with await Task.WhenAll(stream.Items.Select(ProcessItem));. It the number of items is large and processing quick, I'd be reverting back to synchronous processing and using Parallel.ForEach instead.
in my case the number of items is very large. Curious to know why parallel.ForEach (which is what I have for now) will perform better in this case.
@Prabhu, that's because Tasks are quite "heavy" and you really don't want one per item when dealing with large collections. Parallel.ForEach uses smart collection partitioning to achieve parallelisation without too much overhead (not counting cases where you're dealing with really small delegate bodies), and provides a number of optional tweaks and optimisations (i.e. local state), which you should look into in order to squeeze every last bit of performance out of it.
Thank you. Could I do a fire and forget on ProcessItem(item) somehow?
|
3

When I add code to make your example more concrete, I find two possible ways for the recursion to turn out badly. Both of them assume that your data is pretty big and require specific conditions to trigger.

  1. If ProcessItem(string) returns a Task that completes before it is awaited on (or, I assume, it completes before the await finishes spinning), the continuation will execute synchronously. In my code below, I have simulated this by having ProcessItem(string) return Task.CompletedTask. When I do this, the program very quickly terminates with a StackOverflowException. This is because .net’s TPL “Releases Zalgo” by opportunistically executing continuations synchronously without regard to how much space is available in the current stack. That means that it will exacerbate the potential stack space issue that you already have by using a recursive algorithm. To see this, comment out await Task.Yield(); in my code sample below.
  2. If you use some technique to prevent TPL from continuing synchronously (below I use Task.Yield()), eventually the program will run out of memory and die with an OutOfMemoryException. If I understand correctly, this wouldn’t happen if return await were able to emulate the tail-call optimization. I imagine that what is happening here is each call generates something like a book-keeping Task<string> and keeps generating them even though they could be coalesced. To reproduce this error with the sample below, ensure you’re running the program as 32-bit, disable the Console.WriteLine() call (because consoles are really slow), and ensure the await Task.Yield() is uncommented.
using System;
using System.Collections.Generic;
using System.Threading.Tasks;

// Be sure to run this 32-bit to avoid making your system unstable.
class StreamProcessor
{
    Stream GetStream(string streamPosition)
    {
        var parsedStreamPosition = Convert.ToInt32(streamPosition);
        return new Stream(
            // Terminate after we reach 0.
            parsedStreamPosition > 0 ? new[] { streamPosition, } : new string[] { },
            Convert.ToString(parsedStreamPosition - 1));
    }

    Task ProcessItem(string item)
    {
        // Comment out this next line to make things go faster.
        Console.WriteLine(item);
        // Simulate the Task represented by ProcessItem finishing in
        // time to make the await continue synchronously.
        return Task.CompletedTask;
    }

    public async Task<string> ProcessStream(string streamPosition)
    {
        var stream = GetStream(streamPosition);

        if (stream.Items.Count == 0)
            return stream.NextPosition;

        foreach (var item in stream.Items)
        {
            await ProcessItem(item); //ProcessItem() is now an async method
        }

        // Without this yield (which prevents inline synchronous
        // continuations which quickly eat up the stack),
        // you get a StackOverflowException fairly quickly.
        // With it, you get an OutOfMemoryException eventually—I bet
        // that “return await” isn’t able to tail-call properly at the Task
        // level or that TPL is incapable of collapsing a chain of Tasks
        // which are all set to resolve to the value that other tasks
        // resolve to?
        await Task.Yield();

        return await ProcessStream(stream.NextPosition);
    }
}

class Program
{
    static int Main(string[] args) => new Program().Run(args).Result;
    async Task<int> Run(string[] args)
    {
        await new StreamProcessor().ProcessStream(
            Convert.ToString(int.MaxValue));
        return 0;
    }
}

class Stream
{
    public IList<string> Items { get; }
    public string NextPosition { get; }
    public Stream(
        IList<string> items,
        string nextPosition)
    {
        Items = items;
        NextPosition = nextPosition;
    }
}

So, I guess my two recommendations are:

  1. Use Task.Yield() if you aren’t certain that the stack growth of recursion will be interrupted by something else.
  2. As suggested already, avoid recursion if it doesn’t even make sense for your problem in the first place. And even if it makes a clean algorithm, avoid it if your problem size is unbounded.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.