Pbm:
a) Read a local file into Panda dataframe say PD_DF b) Manipulate/Massge the PD_DF and add columns to dataframe c) Need to write PD_DF to HDFS using spark. How do I do it ?
Pbm:
a) Read a local file into Panda dataframe say PD_DF b) Manipulate/Massge the PD_DF and add columns to dataframe c) Need to write PD_DF to HDFS using spark. How do I do it ?
You can use the SQLContext object to invoke the createDataFrame method, which takes an input data which can optionally be a Pandas DataFrame object.
schemaRDD has been replaced by DataFrames in Spark 1.3. Call df.rdd.map(lambda x: [e for e in x]) if you don't want your RDDs elements to be Row instances. Although I don't really see why you'd want that. What format do you want to save to?