EMR: Pyspark conda environment error on AWS Graviton

Question

The same pyspark code works on r7a but not r7g or r8g on a EMR cluster (7.5).

I build the python environment with conda, and use it in pyspark:

conda create -n pyspark python=3.9 --show-channel-urls --channel=conda-forge --override-channels
conda init bash
python -m pip install conda-pack # separate from the req.txt because no hash is given.
conda run -n pyspark python -m pip install -r req.txt
conda pack -n pyspark --output ./pulse-spark-deployment.tar.gz

It use used with the command line (all in one line, split for ease of reading )

bash -c "
PYSPARK_PYTHON=./environment/bin/python
PYTHONPATH=./app 
spark-submit 
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python
--conf spark.yarn.appMasterEnv.PYTHONPATH=./app
--master yarn 
--deploy-mode cluster 
--packages 
    org.apache.spark:spark-avro_2.12:3.5.2,
    org.apache.hadoop:hadoop-aws:3.4.0,
    org.apache.spark:spark-hadoop-cloud_2.12:3.5.2
--archives
  s3://<bucke>/spark/spark-deployment.tar.gz#environment,
  s3://<bucket>/spark/spark.zip#app 
s3://<bucket>/spark/script.py
"

It works perfectly if I use r7a instances, it fails if I use graviton (r7g or r8g).

The errors I get form yarn are:

User application exited with 126

and

./environment/bin/python: ./environment/bin/python: cannot execute binary file

This is typical from an executable for the wrong architecture, but adding --platform-linux-aarch64 to the conda create line does not change anything.

What could go wrong here?

Sir Athos · Accepted Answer · 2024-12-31 18:39:59Z

Make sure you use --platform=linux-aarch64 and not --platform-linux-aarch64 according to the docs.

Running on a Ubuntu 24 x86 host:

~$ conda create -n pyspark_graviton python=3.9 --show-channel-urls --channel=conda-for
ge --override-channels --platform=linux-aarch64
[...]

~$ miniconda3/envs/pyspark_graviton/bin/python3.9 --version
-bash: miniconda3/envs/pyspark_graviton/bin/python3.9: cannot execute binary file: Exec format error

~$ ls -l miniconda3/envs/pyspark_graviton/bin/python3.9                                             
 -rwxrwxr-x 1 ubuntu ubuntu 4221904 Dec 30 21:50 miniconda3/envs/pyspark_graviton/bin/python3.9

~$ file miniconda3/envs/pyspark_graviton/bin/python3.9
miniconda3/envs/pyspark_graviton/bin/python3.9: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, not stripped

It correctly prepares the arm64 python binary, which fails to run on x86 (as expected).

Alternatively, you can also use a Graviton host to prepare the environment, and don't have to worry about --platform.

Collectives™ on Stack Overflow

EMR: Pyspark conda environment error on AWS Graviton

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related