0

I have a dataframe DF as below. Based on the Issue column and Datatype column I wants to create a dynamic query. If Issue column is YES then check for the Datatype, If its StringType add Trim(DiffColumnName) to the query or if Datatype is integer do some other operation like round(COUNT,2) And for the column for which Issue type is NO do nothing and select the Column itself

Query should be like this

Select DEST_COUNTRY_NAME, trim(ORIGIN_COUNTRY_NAME),round(COUNT,2)
+-------------------+-----------+-----+
|     DiffColumnName|   Datatype|Issue|
+-------------------+-----------+-----+
|  DEST_COUNTRY_NAME| StringType|   NO|
|ORIGIN_COUNTRY_NAME| StringType|  YES|
|              COUNT|IntegerType|  YES|
+-------------------+-----------+-----+

I am not sure if I should be using If else condition here or case statement or create a UDF. Also my dataframe (i.e. columns) are dynamic and will be changed every time.

Need some suggestions how to proceed here. Thanks

1 Answer 1

1

This can be accomplished using the following piece of code.

  • Derive the new column by applying the required operations
  • Use collect_list to aggregate the values to an array
  • Format the output using concat_ws and concat
val origDF=Seq(("DEST_COUNTRY_NAME","StringType","NO"),
("ORIGIN_COUNTRY_NAME","StringType","YES"),
("COUNT","IntegerType","YES"),
("TESTCOL","StringType","NO")
).toDF("DiffColumnName","Datatype","Issue")

val finalDF=origDF.withColumn("newCol",when(col("Issue")==="YES" && col("DataType")==="StringType",concat(lit("trim("),col("DiffColumnName"),lit(")")))
when(col("Issue")==="YES" && col("DataType")==="IntegerType",concat(lit("round("),col("DiffColumnName"),lit(",2)")))
when(col("Issue")==="NO",col("DiffColumnName"))
)

finalDF.agg(collect_list("newCol").alias("queryout")).select(concat(lit("select "),concat_ws(",",col("queryout")))).show(false)

I included an additional column to the data for testing and it is giving me the desired output.

+-------------------------------------------------------------------------+
|concat(select , concat_ws(,, queryout))                                  |
+-------------------------------------------------------------------------+
|select DEST_COUNTRY_NAME,trim(ORIGIN_COUNTRY_NAME),round(COUNT,2),TESTCOL|
+-------------------------------------------------------------------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.