0

I am using python 2.7 with pyspark,

I use a user defined function and it works well when i use it like this

def func(x):
    pass 

RDD.map(lambda x:func(x))

but when I create the function inside another script called utils and use

from utils import func as func 
RDD.map(lambda x:func(x))

I get an error

ImportError: No module named utils

how can i import a function from a user defined module and use it with RDD map?

Thanks

1 Answer 1

2

In command line:

spark-submit --py-files utils.py ...

Or in script:

sc.addPyFile('file:///path/to/utils.py')
Sign up to request clarification or add additional context in comments.

3 Comments

Please don't forget to add some text explaining your answer - why it works and how it solves the original problem
how do you integrate this inside a python script using spark context?
SparkContext().getConf().set('pyfiles',['file:///path/to/utils.py'])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.