Problems of running spark for Python

Question

two questions:

how to run python3 in spark module? I run /bin/.pyspark and it automatically runs Python 2.7. How to run Python3?
After I run pyspark, it pops a warning like this: 16/12/29 17:33:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Does it mean I downloaded the wrong spark platform?

I am using MacProBook. Thanks.

Community · Accepted Answer · 2017-05-23 12:10:12Z

3

Follow these steps for:

1 time:

PYSPARK_PYTHON=python3 ./bin/pyspark

Everytime:

>>>cd
>>>vim .bashrc

Add these 2 lines at the end of file and save the file.

export PYSPARK_PYTHON=/usr/bin/python3
export PYSPARK_DRIVER_PYTHON=python3

After exiting from the file, source the .bashrc file to reflect changes.

>>>source .bashrc

Now when you start spark, it will use Python3.

Read this for your 2nd error. It has got to do with 32bit vs 64bit source code compilation:

CommunityBot

11 silver badge

answered Dec 30, 2016 at 1:57

Mohammad Yusuf

17.1k12 gold badges60 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

majdouline · Accepted Answer · 2017-03-24 16:49:41Z

0

add this in your ~/.bashrc `

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/"

export HADOOP_COMMON_LIB_NATIVE_DIR="/usr/local/hadoop/lib/native/"

or : export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/native"

answered Mar 24, 2017 at 16:49

majdouline

115 bronze badges