11

Here's some C# that should illustrate what I'm asking.

Thing FindThing(string key)
{
   // Search for the thing with the right key.
   foreach (Thing th in things)
   {
       // Found it?
       if (th.Key == key)
           return th;
   }

   // Deal with not finding the thing.
   Redacted();
}

Observe that if the right thing is found in the things collection, the FindThing function will return early, cutting off the rest of the collection and the post-loop code. We call this the "early return" pattern. (At least that's what I call it.)

Now let's modify the code slightly. I don't want to use foreach but use a separate function instead. (Why? Because reasons. Read on...)

Thing FindThing(string key)
{
   // Search for the thing with the right key.
   MyForEach(things, MyAction);
   void MyAction(Thing th)
   {
       // Found it?
       if (th.Key == key)
           return th;
   }

   // Deal with not finding the thing.
   Redacted();
}

void MyForEach<T>(IList<T> items, Action<T> actPerItem)
{
    foreach (T item in items)
        actPerItem(item);
}

This won't work in C# because the return th line is attached to the MyAction function. I want that line to be attached to the FindThing function and for the code to unwind the stack for all three functions and organize it so the FindThing function will return the discovered item and cut off the rest of the items in the loop and also the post-loop code.

Now, I could change things so it would work. I am a very good software engineer after all. But this isn't stack-overflow and that isn't my question.

Are there any languages which do what I've illustrated? I'm looking for precedent and to study the implications by studying where people have already travelled in the past.

14
  • 10
    Yes, there are such languages: assembly and its arbitrary jumps. C and C++ with their longjmp. Although in case of C++ it simply transfers control, stack unwinding won't happen. In particular no destructor will run, RAII becomes useless. In general it is not possible to statically determine whether unwinding is needed or not. After all, you can jump anywhere, conditionally. And then you can come back. Should the stack be unwinded then? But seriously, this an antipattern that destroys any static analysis. And melts brain. Just don't. Commented Dec 28, 2024 at 7:32
  • 6
    A restricted form of the above is exceptions. Which is a sequence of long jumps, but well defined, only upwards in the call stack. In that sense most languages have this feature (among widely used languages I think only assembly, C and Rust doesn't have exceptions), you can emulate it through throw/catch mechanism. But what's the point? It is slightly better, at least you can unwind stack. But it's an obscure way to do a straightforward thing. And inefficient. Commented Dec 28, 2024 at 7:43
  • 3
    all you need is for your foreach to allow short-circuiting. Many languages have similar constructs, like all in python, which will return as soon as the callback returns a Truthy value. Commented Dec 28, 2024 at 17:09
  • 1
    @freakish: Are you suggesting that longjmp could jump out of a nested call and then longjmp back in? That would be undefined behaviour; you can only longjmp to a jump buffer saved in a parent function of the one that calls longjmp. Or at least it's UB if the function that called setjmp has returned before you longjmp. I assume longjmp out of it would also count. (man7.org/linux/man-pages/man3/setjmp.3.html although the POSIX man page doesn't seem to mention that restriction, at least not as explicitly: man7.org/linux/man-pages/man3/longjmp.3p.html) Commented Dec 28, 2024 at 22:58
  • 1
    @JensV I think generator functions are an attempt to solve this same sort of problem without implementing the kind of non-local control that this question is asking about. Though I can see arguments that it is, in fact, this kind of non-local control, just a limited form of it. Commented Dec 29, 2024 at 18:07

6 Answers 6

21

This is a form of non-local control-flow. There are many languages which have some form of non-local flow. The most common types of non-local control flow are:

  • GOTO / JMP: allows you to jump from anywhere to anywhere, violating / ignoring any sort of subroutine / module / component / object boundary.
  • call-with-current-continuation. A continuation is a reification of the control flow context. You can think of it as "all the work that is still left to do". call/cc is essentially the functional programming equivalent of GOTO, and is equally powerful.
  • If your language has closures and proper tail calls, you can write your program in continuation-passing-style, which allows you to use continuations manually even in languages that don't support them.
  • Exceptions. It turns out, exceptions can be as powerful as call/cc, which means they can be as powerful as GOTO: if you have exceptions, you can implement continuations yourself. Microsoft had a research project a long time ago called Volta which compiled .NET bytecode to ECMAScript source code. Since .NET supports multi-threading and ECMAScript doesn't (this was long before Node.js, Promises, Web Workers, etc.), they actually implemented .NET Threads using ECMAScript exceptions! (In fact, they used ECMAScript exceptions to implement continuations, and then used those continuations to implement all other non-local control flow supported by .NET.)
  • Ruby has throw / catch which allows you to throw a very lightweight object (a Symbol) from one method or block and catch it somewhere else along the stack. It is essentially exceptions (which Ruby also supports) but much more lightweight, since there is no data attached to the symbol.
  • Depending on what you need to do, Coroutines may be enough. They are not universally powerful like GOTO, but for many use cases, they are powerful enough.
  • A C#-style async / await implementation can be "abused" as coroutines. I believe this applies to at least C# (and VB-NET) itself, ECMAScript, and Python.
  • Some languages have a reified call stack. In some Smalltalks, for example, the call stack is just a regular object you can manipulate. This allows you to implement continuations or any other form of control flow yourself.

There's probably a lot more I'm forgetting …

2
  • "async/await implementation can be "abused" as coroutines" - in ECMAScript it's the other way round, one can implement async/await on top of generator functions (coroutines) Commented Dec 29, 2024 at 5:01
  • You can do this with RETURN-FROM in Common Lisp, as long as the callback function is nested inside the main function. Commented Dec 30, 2024 at 5:58
10

The standard term for the specific thing you are looking for is non-local return. This is a language feature where some sort of code block can be given to another function or method, while retaining the ability to return from its lexically-enclosing method when invoked.

There are several languages that have this as a core feature. Ruby is a common one today:

def f(&x)
  x.call
end
def g
  f { return 1 }
  puts "This doesn't run"
  return 0
end
g // => 1; no output

This comes out of the Smalltalk tradition, and virtually all Smalltalk-likes have this feature as well, including standard Smalltalk, Self, and Pharo. There are a number of others as well; for example, R has this as part of its general lazy evaluation.

I worked on an academic educational programming language called Grace that leans on this very heavily, precisely to allow building control structures of the kind you're looking for:

method foo(target) {
    for (1..5) do { i ->
        if (i == target) then {
            return "found"
        }
    }
    "not found"
}
...
method if(cond : Boolean) then (blk : Action) { // in the standard library somewhere
    cond.ifTrue(blk)
}

Here both the for-do and if-then control structures are ordinary in-language methods, and the braced blocks are ordinary object arguments to them. The return statement in the middle returns from the lexically-enclosing method, regardless of how far down the stack the block is applied. The idea was that this would allow user-defined control structures on a par with the standard ones. There is a lot of subsequent academic literature exploring uses of these features in Grace, and of course all the work on Smalltalk is using a language that has them, although they don't come up much except as an implementation challenge. When looking at Smalltalk, note that "return" is spelled or ^.


Implementing this in the language is a challenge, and I have done it a few times with different approaches in different Grace implementations. The techniques from Jörg's answer are all implementation strategies that could back this as a language feature. setjmp/longjmp are viable approaches in C; throwing an exception to indicate the return, and catching it in the relevant method works in C#, Java, JavaScript, ...; in a continuation-passing interpreter this amounts to just saving and using the original return continuation. They were all a little fiddly, but workable, although there are both code complexity and performance tradeoffs involved.

If you're building this out within a language with one of those base features, you can simulate it yourself without explicit language support. I would be cautious of doing that without a very good reason, but for example implementing a small internal DSL with the semantics required for a common task might be reasonable.


This ability also raises some other questions: in particular, what if I save that block and invoke it after the original method has already returned? Most systems make this a dynamic error, tracking whether the stack frame has already been popped. Some would proceed with the program from the return site again, in effect looping - in continuation-passing style, this is essentially the default - which allows some interesting constructions while also probably causing confusion. Interaction with other language constructs can also be complex: what does a non-local return during a finally block do?

This can be a good feature when the language is built around it, and I think the overall approach used in Grace was reasonable. There is some degree of "action at a distance" that arises out of being able to jump back up the stack like this, and it's much less clear that it's worthwhile in a language that isn't constructed around the feature: when it's a corner case users will forget to account for it, and the costs of the implementation may not be worthwhile for something that is rarely important.

5
  • That the puts "This doesn't run" in g wouldn't run is kinda obvious. But what about code after the x.call in f? Commented Dec 29, 2024 at 5:05
  • No code after the up-stack return will run, except for things like try-finally constructions. It's a direct return from the method itself. Commented Dec 29, 2024 at 5:11
  • Personally, I have enormous difficulty visualizing a loop that involves returning from a given frame more than once. It seems like a magic trick, or time travel. Does the runtime have to keep all of the intervening frames alive indefinitely until it can prove that the continuation is no longer reachable (and then GC everything)? Commented Dec 29, 2024 at 20:59
  • @Kevin A continuation reifies "the rest of the execution of the program", so it does require that the system keep alive anything it relies on (like a closure) and that probably involves GC. It does seem like time travel when you do it, yes. I'm not saying this is a normal thing to do, but under a CPS interpreter every operation is given a "next step" continuation, including as the destination for a function's return value, and if you don't do anything to avoid it in the implementation then this is what happens when someone invokes one of them a second time — the program runs from that point. Commented Dec 29, 2024 at 21:22
  • For an unintentional example, put this code into this system and click the "Execute" button and see what comes up in the lower-right (ignore the rest of it, it's for something else). The in-browser interpreter uses continuation-passing style and inadvertently lets you do this, which the language semantics shouldn't really permit. Here this makes a closed loop where print(foo) shows the method returning several times, and the next lines of the program running too. Commented Dec 29, 2024 at 21:24
8

Kotlin has inline functions that can early return across function boundaries. The foreach example can be written like this:

inline fun <T> myForEach(items: List<T>, action: (T) -> Unit) {
    for (item in items) {
        action(item)
    }
}

fun findThing(key: String): Thing {
    myForEach(things) { th ->
        if (th.key == key) {
            return th  // Early return works because myForEach is inline
        }
    }
    
    return redacted()
}
5
  • 2
    are you sure about this? I don't see any reason why action would not be executed for every item of items, and I don't see how myForEach returns anything at all Commented Dec 28, 2024 at 20:36
  • 1
    Unless the return th inside thing returns early from the call to thing rather than the inline function. Commented Dec 28, 2024 at 22:03
  • 6
    @njzk2 yes, this is real, "why" is because the language purposefully allowed this specific case, and the reason for that was to allow this sort of code to be written, defining a custom loop while retaining the ability to return early. kotlinlang.org/docs/… Commented Dec 29, 2024 at 0:56
  • @DanGetz I didn't know about inline, thanks! (maybe worth adding that documentation link to the answer) Commented Dec 29, 2024 at 21:59
  • My understanding is that myForEach cannot return non locally, but the function passed to action can as shown above. Commented Jan 22 at 20:13
0

amon mentioned Rust's ControlFlow type in a comment, here's an example of what that could look like in this case:

use std::ops::ControlFlow;

#[derive(Debug, Clone)]
struct Thing {
    key: String,
}

impl Thing {
    fn new(key: &str) -> Thing {
        Thing {
            key: key.to_owned(),
        }
    }
}

fn find_thing(things: &[Thing], key: &str) -> Thing {
    // Search for the thing with the right key.
    let my_action = |th: &Thing| {
        println!("inspecting thing {th:?}");
        if th.key == key {
            println!("exiting early");
            return ControlFlow::Break(th.clone());
        }
        ControlFlow::Continue(())
    };
    if let ControlFlow::Break(thing) = my_for_each(things, my_action) {
        println!("exited early with {thing:?}");
        return thing.clone();
    }
    println!("did not exit early");
    redacted()
}

fn redacted() -> Thing {
    Thing {
        key: "not found".to_owned(),
    }
}

fn my_for_each<T, I, F>(items: I, mut act_per_item: F) -> ControlFlow<T>
where
    I: IntoIterator,
    F: FnMut(<I as IntoIterator>::Item) -> ControlFlow<T>,
{
    // Simplified for this question.
    // Please don't tell me to move the foreach back into
    // the above function or call this an "XY". Please.
    for item in items.into_iter() {
        act_per_item(item)?;
    }
    ControlFlow::Continue(())
}

fn main() {
    let things = [Thing::new("thing one"), Thing::new("thing two")];

    println!("looking for thing one");
    let thing = find_thing(&things, "thing one");
    println!("got: {thing:?}");

    println!("looking for thing three");
    let thing = find_thing(&things, "thing three");
    println!("got: {thing:?}");
}

Try it in the Rust Playground

The ControlFlow enum can be one of two values, either Break or Continue. The callback my_action returns a ControlFlow to indicate that it may result in an early return. The actual early return is implemented in the my_for_each function using the ? operator, which returns from the containing function if called on a Break value. Note that we have to check the return value of my_for_each to "catch" the early return. Alternatively, if we wanted to pass the return further up the call stack we could return a ControlFlow from find_thing and use the ? operator on the result of my_for_each.

In the end we get the following output showing the early return behavior:

looking for thing one
inspecting thing Thing { key: "thing one" }
exiting early
exited early with Thing { key: "thing one" }
got: Thing { key: "thing one" }
looking for thing three
inspecting thing Thing { key: "thing one" }
inspecting thing Thing { key: "thing two" }
did not exit early
got: Thing { key: "not found" }
0

Perl has a module Return::Multilevel which allows this: with_return calls a function with a coderef (really more of a continuation) that, when called, returns from the outer function.

In Raku, the loop-control verbs, including last, operate by default on the dynamically innermost loop, even if they aren't lexically inside the loop, which means that this:

sub act-per-item($item) {
    last if $item == 3;
}

for 1..10 -> $i {
    say $i;
    act-per-item($i);
}

Will print

1
2
3

and then exit.

-2

Even nested functions are a bit of an awkward and unusual feature. IIRC, a lot of that stuff (including closures) was added to support Linq.

With the concept of early-returning the outer function in a series of nested functions, the problem is that there can be an assignment at the call site of the inner function, or the inner call site can be part of an expression.

Do you jump right out of the assignment/expression immediately so as to proceed to return from the outer function, or do you first finish executing/evaluating the line on which the inner call site occurred (which might itself involve further calls occurring before the outer return finally occurs)?

When you jump out of a nest of ordinary structured blocks, those blocks are not (and cannot be) parts of expressions, so the issue does not arise.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.