14

According to python's GIL we cannot use threading in CPU bound processes so my question is how does Apache Spark utilize python in multi-core environment?

1 Answer 1

11
+50

Multi-threading python issues are separated from Apache Spark internals. Parallelism on Spark is dealt with inside the JVM.

enter image description here

And the reason is that in the Python driver program, SparkContext uses Py4J to launch a JVM and create a JavaSparkContext.

Py4J is only used on the driver for local communication between the Python and Java SparkContext objects; large data transfers are performed through a different mechanism.

RDD transformations in Python are mapped to transformations on PythonRDD objects in Java. On remote worker machines, PythonRDD objects launch Python sub-processes and communicate with them using pipes, sending the user's code and the data to be processed.

PS: I'm not sure if this actually answers your question completely.

Sign up to request clarification or add additional context in comments.

9 Comments

It think that the main point here is that PySpark doesn't use multi-threading so GIL is simply not an issue.
@zero323 can you elaborate your comment?
There is not much to elaborate. Excluding tests there are only a few places where PySpark is using threads to perform some secondary tasks like starting external process. Everything else it just a good old single threaded processing.
I concur with @zero323 that's why I said all the parallel processing is dealt with inside the JVM.
@eliasah To be fair JVM part is not that heavy on multithreading either, don't you think? There are multiple threads required for housekeeping and JVM executors use threads but in practice it is not really required to achieve parallelism in Spark. One could start equivalent number of workers on each machine and get the same parallelism although at a higher price.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.