PySpark: How do I specify dropna axis in PySpark transformation?

I would like to drop columns that contain all null values using dropna(). With Pandas you can do this with setting the keyword argument axis = 'columns' in dropna(). Here an example in a GitHub post.

How do I do this in PySpark ? dropna() is available as a transformation in PySpark, however axis is not an available keyword.

Note: I do not want to transpose my dataframe for this to work.

How would I drop the furniture column from this dataframe ?

data_2 = { 'furniture': [np.NaN ,np.NaN ,np.NaN], 'myid': ['1-12', '0-11', '2-12'], 'clothing': ["pants", "shoes", "socks"]} 

df_1 = pd.DataFrame(data_2)
ddf_1 = spark.createDataFrame(df_1)
ddf_1.show()

edited Feb 11, 2020 at 10:21

asked Feb 11, 2020 at 9:39

DataBach

1,6853 gold badges23 silver badges47 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

PySpark: How do I specify dropna axis in PySpark transformation?

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked