1

I am working on the spark code where I have to save multiple column values as a object format and save the result to mongodb

Given Dataset


|---|-----|------|----------|
|A  |A_SRC|Past_A|Past_A_SRC|
|---|-----|------|----------|
|a1 | s1  | a2   | s2       |

What I Have tried

val ds1 = Seq(("1", "2", "3","4")).toDF("a", "src", "p_a","p_src")
val recordCol = functions.to_json(Seq($"a", $"src", $"p_a",$"p_src"),struct("a", "src", "p_a","p_src")) as "A"
ds1.select(recordCol).show(truncate = false)

gives me result like

+-----------------------------------------+
|A                                        |
+-----------------------------------------+
|{"a":"1","src":"2","p_a":"3","p_src":"4"}|
+-----------------------------------------+

I am Expecting something like

+-----------------------------------------+
|A                                        |
+-----------------------------------------+
|{"source":"1","value":"2","p_source":"4","p_value":"3"}|
+-----------------------------------------+

How can I change the keys in the object type other than column names. using maps in java ?

1 Answer 1

1

You can pass as in the column struct , so that that will be saved as the name you passed.

 Dataset<Row> tstDS = spark.read().format("csv").option("header", "true").load("/home/exa9/Documents/SparkLogs/y.csv");

              tstDS.show();

/****
+---+-----+------+----------+
|  A|A_SRC|Past_A|Past_A_SRC|
+---+-----+------+----------+
| a1|   s1|    a2|        s2|
+---+-----+------+----------+

****/
              tstDS.withColumn("A", 


                      functions.to_json( 
                              functions.struct(

                                      functions.col("A").as("source"),
                                      functions.col("A_SRC").as("value"),
                                      functions.col("Past_A").as("p_source"),
                                      functions.col("Past_A_SRC").as("p_value")

                                      ))
                      )
              .select("A")
              .show(false);

/****

+-----------------------------------------------------------+
|A                                                          |
+-----------------------------------------------------------+
|{"source":"a1","value":"s1","p_source":"a2","p_value":"s2"}|
+-----------------------------------------------------------+

****/


Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.