Removing Null records in pyspark

Question

I have a spark dataframe like below

Id value
1   \N
2   \N
3    a
4    b
5   \N

I want to remove the \N records, which are null, from the df. How to do this?

samkart · Accepted Answer · 2022-12-02 07:31:45Z

1

the simple filter should work.

data_sdf.filter(data_sdf.value != r'\N').show()

# +---+-----+
# | id|value|
# +---+-----+
# |  3|    a|
# |  4|    b|
# +---+-----+

answered Dec 2, 2022 at 7:31

samkart

6,7133 gold badges19 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

@NamithaJanardhanan - did you use the r before the quoted character? it shouldn;t give you the error if used correctly