I have a pyspark dataframe like this:
+--------------------+--------------------+
| name| value|
+--------------------+--------------------+
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| id| null|
| name| null|
| age| null|
| food| null|
| null| 1|
| null| Joe|
| null| 47|
| null| pizza|
+--------------------+--------------------+
I want to remove the null values from each individual columns so the non-null data lines up.
The desired output is:
+--------------------+--------------------+
| name| value|
+--------------------+--------------------+
| id| 1|
| name| Joe|
| age| 47|
| food| pizza|
+--------------------+--------------------+
I have tried removing nulls doing something like df.dropna(how='any'/'all') but and by separating out the columns and removing the nulls, but then it becomes difficult to join them back together.