5

I have a pySpark dataframe in python as -

from pyspark.sql.functions import col
dataset = sqlContext.range(0, 100).select((col("id") % 3).alias("key"))

the column name is key and I would like to select this column using a variable.

myvar = "key"

now I want to select this column using the myvar variable in perhaps a select statement

I tried this

dataset.createOrReplaceTempView("dataset")
spark.sql(" select $myvar from dataset ").show

but it returns me an error

no viable alternative at input 'select $'(line 1, pos 8)

How do I achieve this in pySpark?

Note that I may have different columns in future and I want to pass more than 1 variables or perhaps a list into SELECT clause.

2
  • The only thing I can suggest you is to collect the data from dataframe and store in ur variable. Commented Sep 13, 2019 at 4:32
  • Just use select. You can pass in lists also. Read more here. Commented Sep 13, 2019 at 14:18

2 Answers 2

4

dataset.select(myVar) will select a single column based on variable

.select can also take a list dataset.select([myVar, mySecondVar])

Sign up to request clarification or add additional context in comments.

Comments

0

If your variable is a python list, you can also do this:

columns = ['column_a', 'column_b', 'column_c']

#select the list of columns
df_pyspark.select(*columns).show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.