0

I've the Hive table data, i would need some help on transforming the data into a below shown "Expected data form".

Two things on the note:

  1. omit the columns which are showing null, like omit the columns abs, ada, adw on the first row.
  2. for those columns which are of type array(ex: abs, ada, adw, alt) & value is not null, include the column name within the array as shown in the Expected data form and keep an outer column name called EVENTS.

Is there a way i can achieve this using Spark-sql, or do i need write some scala UDF. I need the solution in spark-scala. Any help would be much appreciated.

Hive Table data:

     _______________________________________________________________________________________________________
   | vin               |  tt |msg_type |  abs  |ada                     |  adw  | alt                      |
   |___________________|_____|_________|_______|________________________|_______|__________________________|
   | FU7XXXXXXXXXXXXXX |  0  |SIGNAL   | (null)|(null)                  | (null)|[{"E":15XXXXXXXX,"V":0.0}]|
   |__________________ |_____|_________|______ |________________________|_______|__________________________|
   | FSXXXXXXXXXXXXXXX |  0  |SIGNAL   | (null)|[{"E":15XXXXXXXX,"V":1}]| (null)|  (null)                  |
   |___________________|_____|_________|_______|________________________|_______|__________________________|

Expected data:

       _______________________________________________________________________
   | vin               |  tt |msg_type |  EVENTS                              |
   |___________________|_____|_________|______________________________________|
   | FU7XXXXXXXXXXXXXX |  0  |SIGNAL   | [{"SN":"alt","E":15XXXXXXXX,"V":0.0}]|
   |__________________ |_____|_________|______ _______________________________|
   | FSXXXXXXXXXXXXXXX |  0  |SIGNAL   | [{"SN":"ada","E":15XXXXXXXX,"V":1}]  |                
   |___________________|_____|_________|______________________________________|
2
  • can you provide you data in correct tabular form Commented Sep 30, 2019 at 10:47
  • Hello @Nikk, i've updated the sample data in a readable format. Commented Oct 1, 2019 at 5:01

1 Answer 1

1

I have updated your input data and also declaring one String variable to make UDF generic.

 scala> df.show(false)
    +-----------------+---+--------+--------------------------+--------------------------+--------------------------+--------------------------+
    |vin              |tt |msg_type|abs                       |ada                       |adw                       |alt                       |
    +-----------------+---+--------+--------------------------+--------------------------+--------------------------+--------------------------+
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |null                      |null                      |null                      |[{"E":15XXXXXXXX,"V":0.0}]|
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |null                      |[{"E":15XXXXXXXX,"V":1}]  |null                      |null                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"E":15XXXXXXXX,"V":2}]  |null                      |null                      |null                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |null                      |null                      |[{"E":15XXXXXXXX,"V":3}]  |null                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |null                      |null                      |[{"E":15XXXXXXXX,"V":4.1}]|[{"E":15XXXXXXXX,"V":4.2}]|
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"E":15XXXXXXXX,"V":5.1}]|null                      |[{"E":15XXXXXXXX,"V":5.2}]|null                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"E":15XXXXXXXX,"V":6.1}]|[{"E":15XXXXXXXX,"V":6.2}]|[{"E":15XXXXXXXX,"V":6.3}]|null                      |
    +-----------------+---+--------+--------------------------+--------------------------+--------------------------+--------------------------+

    //String of column name that we need to check for null in Event
    scala> val SingalColumns = "abs,ada, adw,alt"
    SingColumns: String = abs,ada, adw,alt

    //UDF declaration 
    scala> def EventUDF:UserDefinedFunction = udf((flagCol:String, r:Row) => {var signal = ""
         |             val flagColList:List[String] = flagCol.reverse.split(s""",""").map(x => x.trim).mkString(",").reverse.split(s",").toList
         |             flagColList.foreach{ x => 
         |                   if (r.getAs(x) != null){
         |                     signal= signal + "," + """{"SN":""" + x.toString +"," + r.getAs(x).toString.replaceFirst("\\[\\{","").replaceFirst("\\]","")
         |                   }
         |             }
         |             signal.replaceFirst(s""",""", """[""").concat("]")
         |             })

    //final DataFrame
    scala> df.withColumn("Event", EventUDF(lit(SingalColumns),struct(df.columns map col: _*))).select("vin","tt","msg_type","Event").show(false)
    +-----------------+---+--------+-------------------------------------------------------------------------------------------------------+
    |vin              |tt |msg_type|Event                                                                                                  |
    +-----------------+---+--------+-------------------------------------------------------------------------------------------------------+
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":alt,"E":15XXXXXXXX,"V":0.0}]                                                                    |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":ada,"E":15XXXXXXXX,"V":1}]                                                                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":abs,"E":15XXXXXXXX,"V":2}]                                                                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":adw,"E":15XXXXXXXX,"V":3}]                                                                      |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":adw,"E":15XXXXXXXX,"V":4.1},{"SN":alt,"E":15XXXXXXXX,"V":4.2}]                                  |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":abs,"E":15XXXXXXXX,"V":5.1},{"SN":adw,"E":15XXXXXXXX,"V":5.2}]                                  |
    |FU7XXXXXXXXXXXXXX|0  |SIGNAL  |[{"SN":abs,"E":15XXXXXXXX,"V":6.1},{"SN":ada,"E":15XXXXXXXX,"V":6.2},{"SN":adw,"E":15XXXXXXXX,"V":6.3}]|
    +-----------------+---+--------+-------------------------------------------------------------------------------------------------------+
Sign up to request clarification or add additional context in comments.

3 Comments

Many thanks for the solution @Nikk, it worked fine.
Hello @Nikk, i'm facing one issue, on a single row, if SN name remains the same then i'm getting as: [{"SN":ALT,"E":157XXXXXXX,"V":20.3XXXXX}, {"E":157XXXXXXXX,"V":20.3XXXXXXXX}] Expected to be: [{"SN":ALT,"E":157XXXXXX,"V":20.3XXXXXXX}, {"SN":ALT,"E":157XXXXXXX,"V":20.3XXXXXXX}]
can you provide that row?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.