Currently I am using spark to read data from bigqiery tables and write it to storage bucket as csv. One issue that i am facing is that the null string values are not being read properly by spark from bq. It reads the null string values but in the csv it writes that value as an empty string with double quotes (i.e. like this "").
# Load data from BigQuery.
bqdf = spark.read.format('bigquery') \
.option('table', <bq_dataset> + <bq_table>) \
.load()
bqdf.createOrReplaceTempView('bqdf')
# Select required data into another df
bqdf2 = spark.sql(
'SELECT * FROM bqdf')
# write to GCS
bqdf2.write.csv(<gcs_data_path> + <bq_table> + '/' , mode='overwrite', sep= '|')
I have tried emptyValue='' and nullValue options with df.write.csv() while writing to csv but dosen't work.
I needed a solution for this problem, if anyone else faced this issue and could help. Thanks!
