0

I am looking to make a list of columns all null. These columns might or might not have values within them. Ideally I am trying to wipe the columns clean of data.

This is what I currently have tried:

    null_cols = ['a', 'b', 'c']
    for cols in null_cols:
        df = (
            df.withColumn(cols, F.when(F.col(cols).isNotNull(), F.lit(None)))
        )

Any ideas?

1 Answer 1

1

You don't need to have the when statement here because you don't care if there is already data in the column or not, just overwrite it with None. Just do

null_cols = ['a', 'b', 'c']
for col in null_cols:
    df = df.withColumn(col, F.lit(None))

Of course these columns must be nullable, which I assume here.

Sign up to request clarification or add additional context in comments.

7 Comments

If these columns aren't nullable? Why would that even be a thing? Constraints?
Well, if the columns aren't nullable you can't set the values to null. So the code above would result in an error. But nullable columns is the default. You can check with df.printSchema() what the constraints on your columns is. Having nullable columns can make sense since null is always a special case and some functions might not work on columns containing nulls.
I meant: having not-nullable columns can make sense, e.g. consider calculating the mean. How should a null be handled? Should it be ignored? Should it return a Null too? If you don't want to think about these special cases and don't consider these valid in your data you can just add the not nullable constraint to your schema
Hmm thanks! How would I iterate through printSchema() for only the columns above?
I am only interested in certain properties and thus want to narrow down the pritnSchema return. The json method worked nicely!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.