I am reading the csv file using Pandas, it's a two column dataframe, and then I am trying to convert to the spark dataframe. The Code for this is:
from pyspark.sql import SQLContext
sqlCtx = SQLContext(sc)
sdf = sqlCtx.createDataFrame(df)
The dataframe:
print(df)
gives this :
Name Category
0 EDSJOBLIST apply at www.edsjoblist.com ['biotechnology', 'clinical', 'diagnostic', 'd...
1 Power Direct Marketing ['advertising', 'analytics', 'brand positionin...
2 CHA Hollywood Medical Center, L.P. ['general medical and surgical hospital', 'hea...
3 JING JING GOURMET [nan]
4 TRUE LIFE KINGDOM MINISTRIES ['religious organization']
5 fasterproms ['microsoft .net']
6 STEREO ZONE ['accessory', 'audio', 'car audio', 'chrome', ...
7 SAN FRANCISCO NEUROLOGICAL SOCIETY [nan]
8 Fl Advisors ['comprehensive financial planning', 'financia...
9 Fortunatus LLC ['bottle', 'bottling', 'charitable', 'dna', 'f...
10 TREADS LLC ['retail', 'wholesaling']
Can anyone help me with this ?
print(df.dtypes)and a small sample of your data.