0

I have a dataframe called 'df' structured as follows

ID name lv1 lv2
abb name1 40.34 21.56
bab name2 21.30 67.45
bba name3 32.45 45.44

In Pandas, I can use the following code to create a new column that contains a list of the lv1 and lv2 values

cols = ['lv1', 'lv2']
df['new_col'] = df[cols].values.tolist()

Due to memory issues because of the size of the data, I am now using Databricks instead (which I have never used before) and need to replicate the above. I've created a Spark dataframe successfully by mounting the location of my data and then loading

file_location = 'dbfs:/mnt/<mountname>/filename.csv'
file_type = "csv"
   
infer_schema = "false"
first_row_is_header = "true"
delimiter = ","

df = spark.read.format(file_type)
  .option("inferSchema", infer_schema)
  .option("header", first_row_is_header)
  .option("sep", delimiter)
  .load(file_location)

display(df)

This loads the data, however, I'm stuck on how to complete the necessary next step. I've found a function called struct in the Spark, however, I can't seem to find the corresponding function in PySpark. Any suggestions?

1 Answer 1

1

It's probably array function that you're looking for.

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [('abb', 'name1', 40.34, 21.56),
     ('bab', 'name2', 21.30, 67.45),
     ('bba', 'name3', 32.45, 45.44)],
    ['ID', 'name', 'lv1', 'lv2'])

df = df.withColumn('new_col', F.array('lv1', 'lv2'))

df.show()
# +---+-----+-----+-----+--------------+
# | ID| name|  lv1|  lv2|       new_col|
# +---+-----+-----+-----+--------------+
# |abb|name1|40.34|21.56|[40.34, 21.56]|
# |bab|name2| 21.3|67.45| [21.3, 67.45]|
# |bba|name3|32.45|45.44|[32.45, 45.44]|
# +---+-----+-----+-----+--------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.