3

Let's say I run a job in Spark with speculation = true.

If a task (let's say T1) takes a long time, Spark would launch a copy of task T1, say, T2 on another executor, without killing off T1.

Now, if T2 also takes more time than the median of all successfully completed tasks, would Spark launch another task T3 on another executor?

If yes, is there any limit to this spawning of new tasks? If no, does Spark limit itself to one parallel job, and waits indefinitely for completion of either one?

1 Answer 1

3

The spark TaskSetManager is responsible for that logic. It is checking that at most one copy of the original task is running when trying to launch a speculatable task. So in your example it should never launch T3 since there would be 2 copies running.

You can find the relevant part of the code here.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for linking to the relevant code. I could find the comment // Speculatable task should only be launched when at most one copy of the original task is running. Doesn't this mean that T3 would be started (since T2 is the copy of the original task), but it would stop at T3, and no T4 would be launched?
no, copiesRunning(index) would be 1 once T1 is running, so if T2 gets started as well copiesRunning(index) would be 2 (assuming T1 is still running) and then the dequeueTaskFromList function would return nothing (namely None).
I am so unlucky that my speculative attempt of task is also going to a busy node and running slow. Is it possible to run more than 1 copy of speculative tasks? For example 4 tasks running in parallel and one of them which gets completed first will be considered.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.