6

Here's the code I'm trying to execute:

from pyspark.mllib.recommendation import ALS
iterations=5
lambdaALS=0.1
seed=5L
rank=8
model=ALS.train(trainingRDD,rank,iterations, lambda_=lambdaALS, seed=seed)

When I run the model=ALS.train(trainingRDD,rank,iterations, lambda_=lambdaALS, seed=seed) command that depends on numpy, the Py4Java library that Spark uses throws the following message:

Py4JJavaError: An error occurred while calling o587.trainALSModel.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 67.0 failed 4 times, most recent failure: Lost task 0.3 in stage 67.0 (TID 195, 192.168.161.55): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/platform/spark/python/lib/pyspark.zip/pyspark/worker.py", line 98, in main
    command = pickleSer._read_with_length(infile)
  File "/home/platform/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length
    return self.loads(obj)
  File "/home/platform/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 421, in loads
    return pickle.loads(obj)
  File "/home/platform/spark/python/lib/pyspark.zip/pyspark/mllib/__init__.py", line 27, in <module>
Exception: MLlib requires NumPy 1.4+

NumPy 1.10 is installed on the machine stated in the error message. Moreover I get version 1.9.2 when executing the following command directly in my Jupyter notebook: import numpy numpy.version.version

I am obviously running a version of NumPy older than 1.4 but I don't know where. How can I tell on which machine do I need to update my version of NumPy?

2 Answers 2

16

It is a bug in Mllib init code

import numpy
if numpy.version.version < '1.4':
    raise Exception("MLlib requires NumPy 1.4+")

'1.10' is < from '1.4' You can use NumPy 1.9.2 .

If you have to use NumPy 1.10 and don't want to upgrade to spark 1.5.1 . Do a manual update to the code. https://github.com/apache/spark/blob/master/python/pyspark/mllib/init.py

Sign up to request clarification or add additional context in comments.

Comments

0

It looks like you have two versions of numpy installed and pyspark is importing the older one. To confirm this, you can do the following:

import numpy
print numpy.__version__
print numpy.__path__

This will probably give you 1.9.2 and it's path. Now do this:

import pyspark
print pyspark.numpy.__version__
print pyspark.numpy.__path__

Is it loading a different numpy from another path? If yes, removing it should most probably solve the issue.

1 Comment

I get the following error when I execute the second set of command import pyspark' 'print pyspark.numpy.__version__' 'print pyspark.numpy.__path__: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-3-92586fa783d5> in <module>() ` 1 import pyspark` ----> 2 print pyspark.numpy.__version__ ` 3 print pyspark.numpy.__path__` `` AttributeError: 'module' object has no attribute 'numpy' `

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.