Pyspark: Convert pyspark.sql.row into Dataframe

Question

I have the following row in pyspark. I want to basically merge it with a pandas dataframe.

Row(Banked_Date_Calc__c=0   NaN
Name: Banked_Date_Calc__c, dtype: float64, CloseDate=0    2018-06-13T00:00:00.000Z
Name: CloseDate, dtype: object, CourseGEV__c=0    2990
Name: CourseGEV__c, dtype: int64, Id=0    0060h0000169NWLAA2
Name: Id, dtype: object, OwnerId=0    0050L000008Z30mQAC
Name: OwnerId, dtype: object, timestamp=0   2018-06-13 17:02:30.017566
Name: timestamp, dtype: datetime64[ns])

Right now I am getting error that DataFrame is not properly called when i am putting the above row in pd.DataFrame(msg)

msg = Row(.....) #Row is from above
pd.DataFrame(msg)

Can you elaborate on how your code "doesn't work"? What were you expecting, and what actually happened? If you got an exception/error, post the full exception details. Please edit these details in or we may not be able to help. — AChampion
– AChampion, Commented Jun 13, 2018 at 7:48
I am getting ValueError: DataFrame constructor not properly called! when I am using pd.DataFrame(msg) where msg is just the row i mentioned above — Gagan
– Gagan, Commented Jun 13, 2018 at 8:05
Downvoting without specifying the reason is rude. But so is the world. — Gagan
– Gagan, Commented Jun 14, 2018 at 0:17

Pierre Gourseaud · Accepted Answer · 2018-06-13 08:34:02Z

2

You can't pass a pyspark row directly to the Pandas Dataframe constructor. You can do it with an intermediary dict.

row_d = Row(...).asDict()
pd_df = pd.DataFrame.from_dict(row_d)

answered Jun 13, 2018 at 8:34

Pierre Gourseaud

2,49716 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pyspark: Convert pyspark.sql.row into Dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related