0

The same pyspark code works on r7a but not r7g or r8g on a EMR cluster (7.5).

I build the python environment with conda, and use it in pyspark:

conda create -n pyspark python=3.9 --show-channel-urls --channel=conda-forge --override-channels
conda init bash
python -m pip install conda-pack # separate from the req.txt because no hash is given.
conda run -n pyspark python -m pip install -r req.txt
conda pack -n pyspark --output ./pulse-spark-deployment.tar.gz

It use used with the command line (all in one line, split for ease of reading )

bash -c "
PYSPARK_PYTHON=./environment/bin/python
PYTHONPATH=./app 
spark-submit 
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./environment/bin/python
--conf spark.yarn.appMasterEnv.PYTHONPATH=./app
--master yarn 
--deploy-mode cluster 
--packages 
    org.apache.spark:spark-avro_2.12:3.5.2,
    org.apache.hadoop:hadoop-aws:3.4.0,
    org.apache.spark:spark-hadoop-cloud_2.12:3.5.2
--archives
  s3://<bucke>/spark/spark-deployment.tar.gz#environment,
  s3://<bucket>/spark/spark.zip#app 
s3://<bucket>/spark/script.py
"

It works perfectly if I use r7a instances, it fails if I use graviton (r7g or r8g).

The errors I get form yarn are:

User application exited with 126

and

./environment/bin/python: ./environment/bin/python: cannot execute binary file

This is typical from an executable for the wrong architecture, but adding --platform-linux-aarch64 to the conda create line does not change anything.

What could go wrong here?

1 Answer 1

1

Make sure you use --platform=linux-aarch64 and not --platform-linux-aarch64 according to the docs.

Running on a Ubuntu 24 x86 host:

~$ conda create -n pyspark_graviton python=3.9 --show-channel-urls --channel=conda-for
ge --override-channels --platform=linux-aarch64
[...]

~$ miniconda3/envs/pyspark_graviton/bin/python3.9 --version
-bash: miniconda3/envs/pyspark_graviton/bin/python3.9: cannot execute binary file: Exec format error

~$ ls -l miniconda3/envs/pyspark_graviton/bin/python3.9                                             
 -rwxrwxr-x 1 ubuntu ubuntu 4221904 Dec 30 21:50 miniconda3/envs/pyspark_graviton/bin/python3.9

~$ file miniconda3/envs/pyspark_graviton/bin/python3.9
miniconda3/envs/pyspark_graviton/bin/python3.9: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, not stripped

It correctly prepares the arm64 python binary, which fails to run on x86 (as expected).

Alternatively, you can also use a Graviton host to prepare the environment, and don't have to worry about --platform.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.