0

I have the following row in pyspark. I want to basically merge it with a pandas dataframe.

Row(Banked_Date_Calc__c=0   NaN
Name: Banked_Date_Calc__c, dtype: float64, CloseDate=0    2018-06-13T00:00:00.000Z
Name: CloseDate, dtype: object, CourseGEV__c=0    2990
Name: CourseGEV__c, dtype: int64, Id=0    0060h0000169NWLAA2
Name: Id, dtype: object, OwnerId=0    0050L000008Z30mQAC
Name: OwnerId, dtype: object, timestamp=0   2018-06-13 17:02:30.017566
Name: timestamp, dtype: datetime64[ns])

Right now I am getting error that DataFrame is not properly called when i am putting the above row in pd.DataFrame(msg)

msg = Row(.....) #Row is from above
pd.DataFrame(msg)
3
  • Can you elaborate on how your code "doesn't work"? What were you expecting, and what actually happened? If you got an exception/error, post the full exception details. Please edit these details in or we may not be able to help. Commented Jun 13, 2018 at 7:48
  • I am getting ValueError: DataFrame constructor not properly called! when I am using pd.DataFrame(msg) where msg is just the row i mentioned above Commented Jun 13, 2018 at 8:05
  • Downvoting without specifying the reason is rude. But so is the world. Commented Jun 14, 2018 at 0:17

1 Answer 1

2

You can't pass a pyspark row directly to the Pandas Dataframe constructor. You can do it with an intermediary dict.

row_d = Row(...).asDict()
pd_df = pd.DataFrame.from_dict(row_d)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.