0

I'm trying to submit a Python script on AWS EMR that imports numpy but I get

ImportError: No module named numpy 

I tried using one of the answers here: No module named numpy when spark-submitting. I created a bootstrap_actions.sh script that includes

 sudo yum install python-numpy python-scipy -y

and I run the script when I create the cluster but still get the import error. Any solution on how can I get import numpy to work?

1 Answer 1

2

For Amazon EMR you need to use bootstrap actions. Installing from the console only changes the master node and not the task nodes.

runners:
  emr:
    bootstrap:
    - sudo yum install -y python27-numpy

I am assuming that you will be using Python 2.7. If you are using Python 3.x, the link below has examples installing with PIP in the bootstrap. I am also assuming that you are using a recent EMR AMI.

EMR Bootstrapping Cookbook

Sign up to request clarification or add additional context in comments.

1 Comment

It works! I just changed my bootstrap script to include the line you had: sudo yum install -y python27-numpy - Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.