In order to write a standalone script, I would like to start and configure a Spark context directly from Python. Using PySpark's script I can set the driver's memory size with:
$ /opt/spark-1.6.1/bin/pyspark
... INFO MemoryStore: MemoryStore started with capacity 511.5 MB ...
$ /opt/spark-1.6.1/bin/pyspark --conf spark.driver.memory=10g
... INFO MemoryStore: MemoryStore started with capacity 7.0 GB ...
But when starting the context from the Python module, the driver's memory size cannot be set:
$ export SPARK_HOME=/opt/spark-1.6.1
$ export PYTHONPATH=$PYTHONPATH:$SPARK_HOME/python
$ python
>>> from pyspark import SparkConf, SparkContext
>>> sc = SparkContext(conf=SparkConf().set('spark.driver.memory', '10g'))
... INFO MemoryStore: MemoryStore started with capacity 511.5 MB ...
The only solution I know is to set spark.driver.memory in sparks-default.conf, which is not satisfactory.
As explained in this post, it makes sense for Java/Scala not to able able to change the driver's memory size once the JVM is started.
Is there any way to somehow configure it dynamically from Python before or when importing the pyspark module?