1

I'm trying to run some jobs on aws cli using a virtual environment where I installed some libraries. I followed this guide; the same is here. But when I run the job I have this error:

Job execution failed, please check complete logs in configured logging destination. ExitCode: 1. Last few exceptions: Caused by: java.io.IOException: error=2, No such file or directory Exception in thread "main" java.io.IOException: Cannot run program "./environment/bin/python"

I also tried /home/hadoop/environment/bin/python as path but I obtain the same result. My job conf are:

--conf spark.archives=s3://mybucket/dependencies/myenv.tar.gz#environment --conf spark.emr-serverless.driverEnv.PYSPARK_DRIVER_PYTHON=./environment/bin/python --conf spark.emr-serverless.driverEnv.PYSPARK_PYTHON=./environment/bin/python --conf spark.emr-serverless.executorEnv.PYSPARK_PYTHON=./environment/bin/python

If I run in the job

os.listdir("./environment/bin/)

The result is

['python3.9', 'pip', 'pip3.9', 'rst2xetex.py', 'rstpep2html.py', 'f2py3', 'rst2latex.py', 'f2py', 'rst2odt.py', 'rst2html4.py', 'pip3', 'aws', 'python3', 'jp.py', 'rst2odt_prepstyles.py', 'pyrsa-encrypt', 'activate', 'rst2man.py', 'pyrsa-priv2pub', 'python', 'pyrsa-keygen', 'pyrsa-verify', 'rst2html.py', 'aws_completer', 'f2py3.9', 'venv-pack', 'rst2pseudoxml.py', 'aws_bash_completer', 'aws_zsh_completer.sh', 'aws.cmd', 'rst2s5.py', 'rst2xml.py', 'pyrsa-decrypt', 'rst2html5.py', 'Activate.ps1', '__pycache__', 'pyrsa-sign']

So the path should be correct. I also tried to set the PYSPARK_DRIVER_PYTHON inside the script as

os.environ['PYSPARK_PYTHON'] = "./environment/bin/python"
os.environ['PYSPARK_DRIVER_PYTHON'] = "./environment/bin/python"

But in this case the error is when I import libraries that I installed in the virtualenv, so it runs the script with the standard python.

Can you help me?

1
  • I have the same problem. even in the logs it says unpacking but still fails: Unpacking an archive s3://space/spark-env/pyspark_venv.tar.gz#environment from /tmp/spark-02908b0e-9b64-469d-b094-edee291a2426/pyspark_venv.tar.gz to /home/hadoop/./environment Commented Jul 20, 2022 at 13:29

1 Answer 1

8

The problem is that you probably haven't used Amazon Linux 2 to create the venv. Using Amazon Linux and Python 3.7.10 did it for me.

As detailed here you can use similar to this docker file to generate such a venv. you better use a requirements.txt to make it more reusable but it gives you the idea.

FROM --platform=linux/amd64 amazonlinux:2 AS base

RUN yum install -y python3

ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

RUN python3 -m pip install --upgrade pip && \
    python3 -m pip install \
    great_expectations==0.15.6 \
    venv-pack==0.2.0

RUN mkdir /output && venv-pack -o /output/pyspark_ge.tar.gz

FROM scratch AS export
COPY --from=base /output/pyspark_ge.tar.gz /
Sign up to request clarification or add additional context in comments.

1 Comment

to copy the file out : `docker buildx build --output type=local,dest=. .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.