I have a pySpark dataframe in python as -
from pyspark.sql.functions import col
dataset = sqlContext.range(0, 100).select((col("id") % 3).alias("key"))
the column name is key and I would like to select this column using a variable.
myvar = "key"
now I want to select this column using the myvar variable in perhaps a select statement
I tried this
dataset.createOrReplaceTempView("dataset")
spark.sql(" select $myvar from dataset ").show
but it returns me an error
no viable alternative at input 'select $'(line 1, pos 8)
How do I achieve this in pySpark?
Note that I may have different columns in future and I want to pass more than 1 variables or perhaps a list into SELECT clause.
select. You can pass in lists also. Read more here.