3

I've been reading the source code of Spark, but I still not be able to understand how does Spark standalone implement the resource isolation and allocation. For example Mesos use LXC or Docker to implement the container for resource limitation. So how does Spark Standalone to implement this. for example I ran 10 threads in one executor, but Spark only gave the executor one core, so how does Spark guarantee these 10 threads only run on one cpu core.

After the following testing code, it turns out that Spark Standalone Resource Allocation is somehow Fake. I just had one Worker(executor) and only gave the executor one core(the machine has 6 cores totally), when the following code was running I found there were 5 cores 100% usage. (My code kicked off 4 threads)

object CoreTest {
  class MyThread extends Thread {
    override def run() {
      while (true) {
        val i = 1+1
      }
    }
  }
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("core test")
    val sc = new SparkContext(conf)
    val memRDD = sc.parallelize(Seq(1), 1)
    memRDD.foreachPartition { part =>
      part.foreach {
        x =>
          var hello = new MyThread()
          hello.start
          hello = new MyThread()
          hello.start
          hello = new MyThread()
          hello.start
          hello = new MyThread()
          hello.start
          while (true) {
            val j = 1+2
            Thread.sleep(1000)
          }
      }
    }
    sc.stop()
  }
}

Following Question: I'm curious that if I ran the above code on Spark+Mesos, what would happen, would Mesos limit the 4 threads only run on one core.

2 Answers 2

4

but I still not be able to understand how does Spark standalone implement the resource isolation and allocation.

With Spark, we have the notation of a Master node and Worker nodes. We can think about the latter as a resource pool. Each worker has CPU and RAM which it brings to the pool, and Spark jobs can utilize the resources in that pool to do their computation.

Spark Standalone has the notation of an Executor, which is the process that handles the computation, and to which we give resources from the resource pool. In any given executor, we run different stages of a computation which is composed of different tasks. Now, we can control the amount of computation power (cores) a given task uses (via spark.tasks.cpu configuration parameter), and we also control the general amount of computation power a given job may have (via spark.cores.max, which tells the cluster manager how many resources in total we want to give to the particular job we're running). Note that Standalone is greety by default and will schedule an executor on every Worker node in the cluster. We can get finer grained control over how many actual Executors we have by using Dynamic Allocation.

for example I ran 10 threads in one executor, but Spark only gave the executor one core, so how does Spark guarantee these 10 threads only run on one cpu core.

Spark doesn't verify that the execution only happens on a single core. Spark doesn't know which CPU cycle it'll get from the underlying operating system. What Spark Standalone does attempt to do is resource management, it tells you "Look, you have X amount of CPUs and Y amount of RAM, I will not let you schedule jobs if you don't partition your resources properly".

Sign up to request clarification or add additional context in comments.

2 Comments

thank you Yuval, I wrote more details on my question
@jack It is generally not advised to modify your question after you've asked it and people made an effort to answer. Perhaps consider asking a new question instead.
2

Spark standalone handles only resource allocation which is a simple task. All it is required is keeping tabs on:

  • available resources.
  • assigned resources.

It doesn't take care of resource isolation. YARN and Mesos, which have broader scope, don't implement resource isolation but depend on Linux Control Groups (cgroups).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.