Convert spark Dataframe Column nullability to False without converting to rdd

Question

I have a data frame with columns nullability as True. Wanted to convert to False in Pyspark.

I can do it in the below way. But I don't want to convert to rdd because I'm reading as structured streaming and converting to rdd is not recommended.

def set_df_columns_nullable(self, spark, df, column_list, nullable=True):
        for struct_field in df.schema:
            if struct_field.name in column_list:
                struct_field.nullable = nullable
        df_mod = spark.createDataFrame(df.rdd, df.schema)
        return df_mod

Thanks in Advance

Actually I'm using Abris to convert normal data to confluent avro format before writing to kafka. while I'm using to_confluent_avro function, It is throwing Not a Union exception. So It is working, If I change the nullability of the column to False. — Learnis
– Learnis, Commented Jun 27, 2020 at 18:08
Actually I'm using structured streaming, converting to rdd and backing to DF is overhead. because of this, I may miss some features — Learnis
– Learnis, Commented Jun 27, 2020 at 18:15

Wassim Maaoui · Accepted Answer · 2022-02-17 17:03:25Z

1

You can actually update column nullability without casting to RDD

dataFrame
  .withColumn(columnName, new Column(AssertNotNull(col(columnName).expr)))

source

Note that the above would fail at execution if you have null values

answered Feb 17, 2022 at 17:03

Wassim Maaoui

2031 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Convert spark Dataframe Column nullability to False without converting to rdd

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related