how does Spark standalone implement resource allocation

Question

I've been reading the source code of Spark, but I still not be able to understand how does Spark standalone implement the resource isolation and allocation. For example Mesos use LXC or Docker to implement the container for resource limitation. So how does Spark Standalone to implement this. for example I ran 10 threads in one executor, but Spark only gave the executor one core, so how does Spark guarantee these 10 threads only run on one cpu core.

After the following testing code, it turns out that Spark Standalone Resource Allocation is somehow Fake. I just had one Worker(executor) and only gave the executor one core(the machine has 6 cores totally), when the following code was running I found there were 5 cores 100% usage. (My code kicked off 4 threads)

object CoreTest {
  class MyThread extends Thread {
    override def run() {
      while (true) {
        val i = 1+1
      }
    }
  }
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("core test")
    val sc = new SparkContext(conf)
    val memRDD = sc.parallelize(Seq(1), 1)
    memRDD.foreachPartition { part =>
      part.foreach {
        x =>
          var hello = new MyThread()
          hello.start
          hello = new MyThread()
          hello.start
          hello = new MyThread()
          hello.start
          hello = new MyThread()
          hello.start
          while (true) {
            val j = 1+2
            Thread.sleep(1000)
          }
      }
    }
    sc.stop()
  }
}

Following Question: I'm curious that if I ran the above code on Spark+Mesos, what would happen, would Mesos limit the 4 threads only run on one core.

Yuval Itzchakov · Accepted Answer · 2016-11-26 21:15:59Z

4

but I still not be able to understand how does Spark standalone implement the resource isolation and allocation.

With Spark, we have the notation of a Master node and Worker nodes. We can think about the latter as a resource pool. Each worker has CPU and RAM which it brings to the pool, and Spark jobs can utilize the resources in that pool to do their computation.

Spark Standalone has the notation of an Executor, which is the process that handles the computation, and to which we give resources from the resource pool. In any given executor, we run different stages of a computation which is composed of different tasks. Now, we can control the amount of computation power (cores) a given task uses (via spark.tasks.cpu configuration parameter), and we also control the general amount of computation power a given job may have (via spark.cores.max, which tells the cluster manager how many resources in total we want to give to the particular job we're running). Note that Standalone is greety by default and will schedule an executor on every Worker node in the cluster. We can get finer grained control over how many actual Executors we have by using Dynamic Allocation.

for example I ran 10 threads in one executor, but Spark only gave the executor one core, so how does Spark guarantee these 10 threads only run on one cpu core.

Spark doesn't verify that the execution only happens on a single core. Spark doesn't know which CPU cycle it'll get from the underlying operating system. What Spark Standalone does attempt to do is resource management, it tells you "Look, you have X amount of CPUs and Y amount of RAM, I will not let you schedule jobs if you don't partition your resources properly".

edited Nov 26, 2016 at 21:15

answered Nov 26, 2016 at 21:09

Yuval Itzchakov

150k32 gold badges275 silver badges333 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jack Over a year ago

thank you Yuval, I wrote more details on my question

Yuval Itzchakov Over a year ago

@jack It is generally not advised to modify your question after you've asked it and people made an effort to answer. Perhaps consider asking a new question instead.

user6022341 · Accepted Answer · 2016-11-26 20:44:17Z

2

Spark standalone handles only resource allocation which is a simple task. All it is required is keeping tabs on:

available resources.
assigned resources.

It doesn't take care of resource isolation. YARN and Mesos, which have broader scope, don't implement resource isolation but depend on Linux Control Groups (cgroups).

answered Nov 26, 2016 at 20:44

community wiki

user6022341

Collectives™ on Stack Overflow

how does Spark standalone implement resource allocation

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related