2

I am using the following data:

Data for df1:

c1,c2,c3,c4
k1,i,aa,k
k5,j,ee,l

Data for df2:

c1,avc2,c3,avc4
k1,a,aa,e
k2,b,bb,f
k3,c,cc,g
k4,d,dd,h

I am trying to create a dynamic query string based on the conditions using the code below:

val PRIM_CHECK="c1,c3".split(",").toList
val COLUMN_UNCHANGE="c4".split(",").toList
var qb = new ListBuffer[String]()
val df3=df1.join(df2,seq("c1","c3"), "outer")
    for(i<-avro_inp.columns) 
        {
      if(PRIM_CHECK.contains(i))
        {

        }
      else if(COLUMN_UNCHANGE.contains(i)) 
        {
        qb+=""".withColumn(""""+i+"""", when('"""+""+i+""".isNotNull,'"""+i+""").otherwise('av"""+""+i+"""))"""
         }
      else
        {
          qb+=""".withColumn(""""+i+"""", when('av"""+""+i+""".isNull,'"""+i+""").otherwise('av"""+""+i+"""))"""
        }

  }

    val check=qb.mkString

However, I want to run the below code

df3.+""+check+""+.show()

But, I could not run the above code because of the string in the query. Is there any way that I can execute it?

4
  • jesus, this is ugly :) Commented Aug 9, 2017 at 19:33
  • @RaphaelRoth i need to create a dynamic query with lot of conditions .I did not find a better way to do. Commented Aug 9, 2017 at 19:37
  • but I cannot imagine this to work, you write scala-code in a string, this will not execute Commented Aug 9, 2017 at 19:50
  • Did my answer solved your problem? Then please accept it Commented Aug 10, 2017 at 13:26

1 Answer 1

2

You cannot write scala-code in a string and "execute" this string (something like eval). Maybe there are hacks to achieve this, but it's definitely not how to write spark/scala code.

I would suggest something like this:

import org.apache.spark.sql.functions._

val df_result = avro_inp.columns.foldLeft(df3) { case (df, i) =>
  if (PRIM_CHECK.contains(i)) {
    df
  }
  else if (PRIM_CHECK.contains(i)) {
    df.withColumn(i, when(col(i).isNotNull, col(i)).otherwise(col("av" + i)))
  }
  else {
    df.withColumn(i, when(col(i).isNull, col(i)).otherwise(col("av" + i)))
  }
} 

df_result.show

or alternatively using a for-loop and df_result defined as var:

var df_result = df3

for (i <- avro_inp.columns) {
  if (PRIM_CHECK.contains(i)) {
  }
  else if (PRIM_CHECK.contains(i)) {
    df_result = df_result.withColumn(i, when(col(i).isNotNull, col(i)).otherwise(col("av" + i)))
  }
  else {
    df_result = df_result.withColumn(i, when(col(i).isNull, col(i)).otherwise(col("av" + i)))
  }

}

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.