1

I am trying to run the line of code:

pd.get_dummies(pd_df, columns = ['ethnicity'])

However, I keep getting the error 'DataFrame' object has no attribute '_internal'. It looks like its linked to the ...pyspark/pandas/namespace.py file so therefore I am not too sure how to fix it.

Unfortunately, the dataframe itself is private so I can't show/describe it on Stackoverflow however any information about why this could be happening would be greatly appreciated!

I can make the example below work perfectly but it wont work on my code even though it is exactly the same I just have a different DataFrame that has been changed from PySpark to Pandas:

sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
                           ,"sales":[50000,52000,90000,34000,42000,72000,49000,55000,67000,65000,67000]
                           ,"region":["East","North","East","South","West","West","South","West","West","East",np.nan]
                           }
                          )
pd.get_dummies(sales_data, columns = ['region'])

5
  • pd_df is a pyspark dataframe or a pandas dataframe? Commented Nov 21, 2022 at 21:41
  • Pandas dataframe :) @Ben.T Commented Nov 21, 2022 at 21:42
  • do you build it from a pyspark dataframe? I'm asking because you seem to say it comes from the file ...pyspark/pandas/namespace.py and also you talk about show that is not in pandas (as far I now). if yes, it may be related to this Q&A even if it is not strickly the same error Commented Nov 21, 2022 at 21:49
  • Yes it is a PySpark dataframe which I then use .toPandas(). Thank you I will have a look! Commented Nov 21, 2022 at 21:52
  • 1
    @Ben.T I dont think it is to do with the version as I am able to use it perfectly with the example I have included in the question. Thank you though Commented Nov 21, 2022 at 22:13

1 Answer 1

0

I had this same error. I was confusing the execution by using ps (pyspark.pandas) instead of pd (pandas).

Ensure your alias are correct and you're not accidentally renaming a pandas instantiation:

Ex.

import pyspark.pandas as pd
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.