Specify pyspark dataframe schema with string longer than 256

Question

I'm reading a source that got descriptions longer then 256 chars. I want to write them to Redshift.

According to: https://github.com/databricks/spark-redshift#configuring-the-maximum-size-of-string-columns it is only possible in Scala.

According to this: https://github.com/databricks/spark-redshift/issues/137#issuecomment-165904691 it should be a workaround to specify the schema when creating the dataframe. I'm not able to get it to work.

How can I specify the schema with varchar(max)?

df = ...from source

schema = StructType([
    StructField('field1', StringType(), True),
    StructField('description', StringType(), True)
])

df = sqlContext.createDataFrame(df.rdd, schema)

zero323 · Accepted Answer · 2018-09-06 11:25:05Z

3

Redshift maxlength annotations are passed in format

{"maxlength":2048}

so this is the structure you should pass to StructField constructor:

from pyspark.sql.types import StructField, StringType

StructField("description", StringType(), metadata={"maxlength":2048})

or alias method:

from pyspark.sql.functions import col

col("description").alias("description", metadata={"maxlength":2048})

If you use PySpark 2.2 or earlier please check How to change column metadata in pyspark? for workaround.

answered Sep 6, 2018 at 11:25

zero323

331k108 gold badges982 silver badges958 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user1217169 Over a year ago

Setting this as the correct answer, even tho I did not get it to work yet, it answers my question. It should also work in python now as well, according to docs.databricks.com/spark/latest/data-sources/aws/… (Databricks have recently closed-sourced the spark-redshift project)

Collectives™ on Stack Overflow

Specify pyspark dataframe schema with string longer than 256

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related