1

Now I am doing a project for my course, and find a problem to convert pandas dataframe to pyspark dataframe . I have produce a pandas dataframe named data_org as follows. enter image description here

And I want to covert it into pyspark dataframe to adjust it into libsvm format. So my code is

from pyspark.sql import SQLContext  
spark_df = SQLContext.createDataFrame(data_org)

However, it went wrong.

TypeError: createDataFrame() missing 1 required positional argument: 'data'

I really do not know how to do. And my python version is 3.5.2 and pyspark version is 2.0.1. I am looking forward to your reply.

1 Answer 1

4

First pass sparkContext to SQLContext:

from pyspark import SparkContext
from pyspark.sql import SQLContext 
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)

then use createDataFrame like below:

spark_df = sql.createDataFrame(data_org)
Sign up to request clarification or add additional context in comments.

7 Comments

what does sc mean? I fisrt attain my data by the format of pandas dataframe.
sc means sparkContext. if you are running the script using spark-submit, it is initialized by spark
I use the anaconda spyder to run to pyspark code. How to solve the problem in this situation?
Thanks a million !
Can you help me solve my another question?stackoverflow.com/questions/49559331/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.