1

I am trying to read the data from one dataframe and use it in another. How can I do it gracefully?

val query = s"select distinct p_id, lower(regexp_replace(p_id,'[^a-zA-Z0-9]+','_')) as p_id_formatted, lower(regexp_extract(f_id,'^([^\\.]+)\\.?',1)) as f_id_formatted, column_name from default.rc_pcoders"
val run_query = sql(query)
val table_name = run_query.select(concat(lit("nepp"), lit("_"),$"p_id_formatted", lit("_") ,$"f_id_formatted ").alias("tablename"),$"column_name")

This gives me below output, which essentially represents a tablename

+------------------+-----------+
|tablename         |column_name|
+------------------+-----------+
|nepp_148hl16011_cm|cmtrt      |
|nepp_148hl16011_mh|mhaspe     |
|nepp_148hl16011_ae|aeputt     |
+------------------+-----------+

How can I get the column names from each of these tables? Something like (below query doesn't work though)

val whole_query = sql("show columns in "table_name.tablename"")

1 Answer 1

1

First, collect all the names of the tables to load:

val tableNames = df.collect().map(row => row.getAs[String]("tablename")).toSeq

Second, get the references to the respective DataFrames, associate them with their column names

val sqlCtx: SQLContext = // your SQL context ref
val dfToColumns = tableNames.map(table => {
  val columnNames = sqlCtx.table(table).schema.fieldNames.toSeq
  (table, columnNames)
}).toMap

dfToColumns is a Map[String, Seq[String]] with DataFrame names as keys and Seqs of their respective column names as values.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.