Use a list to define SELECT columns in a query

Question

I have a need to query from a parquet file where the column names are completely inconsistent. In order to remedy this issue and insure that my model gets exactly the data it expects I need to 'prefetch' the columns list then apply some regex patterns to qualify which columns I need to retrieve. In pseudocode:

PrefetchList = sqlContext.read.parquet(my_parquet_file).schema.fields
# Insert variable statements to check/qualify the columns against rules here
dfQualified = SELECT [PrefetchList] from parquet;

I've searched around to see if this is achievable but not had any success. If this is syntactically correct (or close) or if someone has other suggestions I am open to it.

Thanks

I see the tags pyspark and I am assuming you are using Python buy If you are fine with using scala , case class can solve your problem effectively. — Achilleus
– Achilleus, Commented Nov 20, 2017 at 22:37

Roberto Congiu · Accepted Answer · 2017-11-20 20:56:28Z

1

You can use either the schema method but you can also use the .columns method. Notice that the select method in spark is a little odd, it's defined as def select(col: String, cols: String*) so you can't pass back to it select(fields:_*), and you'd have to use df.select(fields.head, fields.tail:_*) which is kind of ugly, but luckily there's selectExpr(exprs: String*) as an alternative. So this below will work. It takes only columns that begin with 'user'

fields = df.columns.filter(_.matches("^user.+")) // BYO regexp
df.selectExpr(fields:_*)

This of course assumes that df contains your dataframe, loaded with sqlContext.read.parquet().

answered Nov 20, 2017 at 20:56

Roberto Congiu

5,2531 gold badge29 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Use a list to define SELECT columns in a query

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related