I have a situation where a multithreaded service is making parallel calls to user-submitted code over which I have no control. I'm hoping to find a mechanism by which these calls can be timed out if they enter into deadlock/infinite loop/other non-returning states, regardless of what the code being called is. For the timeout, I need:
- To recoverably continue past a non-returning call within a bounded time and
- To kill the non-returning thread and yield whatever resources it held
So far, the mechanism I've been using to handle timeouts uses virtual threads, and meets the first criterion but not the second:
public static String maybeTimeout(Callable<String> task, long timeOutMs) {
try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
Future<String> taskFuture = executor.submit(task);
try {
return taskFuture.get(timeOutMs, TimeUnit.MILLISECONDS);
} catch (Exception e) {
taskFuture.cancel(true);
System.out.println("Timed out!");
return "Timed out";
}
}
}
This works for code which yields, (here with a Thread.sleep call)
public static Boolean sneakyTrue() {
return true;
}
public static String sleepy() {
return maybeTimeout(
() -> {
var x = 0;
while (sneakyTrue()) {
x = x + 1;
System.out.println("Sleeping again: " + x);
Thread.sleep(1000);
}
return "Unreachable";
},
5000);
}
But not with code that doesn't:
public static String spinny() {
return maybeTimeout(
() -> {
var x = 0;
var y = 0;
while (sneakyTrue()) {
x = x + 1;
if (x == 0) {
System.out.println("Spinning another maxInt times:" + y);
y = y + 1;
}
}
return "Unreachable";
},
5000);
}
The tests I have that confirm this behavior:
import org.junit.jupiter.api.Test;
import java.util.concurrent.*;
...
@Test
void sleepyShouldBeAbleToTimeout() {
assert (sleepy().equals("Timed out"));
try {
Thread.sleep(10 * 1000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
@Test
void spinnyShouldBeAbleToTimeout() {
assert (spinny().equals("Timed out"));
try {
Thread.sleep(10 * 1000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
If you run the above tests, both "pass", but notice that during the 10-second sleep, spinny continues to run/print to the console, while sleepy is quickly killed. My concern is that this could lead to thread/resource starvation if many threads get stuck in non-yielding user code. Is there any mechanism that can more strictly terminate threads?