Fill missing value in Spark dataframe

Question

I 'm trying to fill missing values in spark dataframe using PySpark. But there is not any proper way to do it. My task is to fill the missing values of some rows with respect to their previous or following rows. Concretely , I would change the 0.0 value of one row to the value of the previous row, while doing nothing on a none-zero row . I did see the Window function in spark, but it only supports some simple operation like max, min, mean, which are not suitable for my case. It would be optimal if we could have a user defined function sliding over the given Window. Does anybody have a good idea ?

Please share example data, code you tried and expected output. — mtoto
– mtoto, Commented Jul 17, 2016 at 12:01

Milad Khajavi · Accepted Answer · 2016-07-17 11:15:25Z

1

Use Spark window API to access previous row data. If you work on time series data, see also this package for missing data imputation.

answered Jul 17, 2016 at 11:15

Milad Khajavi

2,85910 gold badges43 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Milad Khajavi Over a year ago

@wayag If the answer works for you, accept the answer :)

Collectives™ on Stack Overflow

Fill missing value in Spark dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related