1

i have a table which description is as follows:

# col_name              data_type               comment             

id                      string                                      
persona_model           map<string,struct<score:double,tag:string>>                     

# Partition Information      
# col_name              data_type               comment             

process_date            string          

sample row would be something like this(tab separated):

000000E91010441BB122402A45D439E7        {"Tech":{"score":0.21678,"tag":"OTHERS"}}    2018-05-16-01              

Now I want to form another table with only 2 columns id and its respective score in it.
How can i do it in scala spark?

Moreover, whats really bugging me is how can I access only a particular score and how can I store it in an integer variable lets say temp?

8
  • explode the map, select score from struct Commented May 22, 2018 at 9:44
  • can you provide with sample input, expected output and what you've tried? Commented May 22, 2018 at 9:59
  • @RameshMaharjan i have edited and added an example, i want to store that score 0.21278 inside a variable integer temp what to do? and also i also want to create a new table which contains those all ids and scores. please help Commented May 22, 2018 at 10:07
  • can you format the input data according to the table format? Commented May 22, 2018 at 10:10
  • @RameshMaharjan its actually formatted according to the table format , 1st line of sample row indicated id , 2nd row indicates the persona_model, and the 3rd row indicates process_date Commented May 22, 2018 at 10:12

1 Answer 1

1

You can do this:

val newDF = oldDF.select(col("id"), col("persona_model")("Tech")("score").as("temp"))

then you can extract temp values easily.

update: if you have more than one Key then the procedure is a little more complex.

first create a class for the struct (necesary for type cast):

case class Score(score: Double, tag: String)

then extract all the keys from the data:

val keys = oldDF.rdd
    .flatMap(r => r.getMap(1).asInstanceOf[Map[String, Score]].toList)
    .collect.map(_._1).distinct.toList

finally you can extract all names like this:

def condition(keys: List[String]): Column = {
     keys match {
        case k::ks => when(col("persona_model")(k)("score").isNotNull, col("persona_model")(k)("score")).otherwise(condition(ks))
        case nil  => lit(null)
     }
 }

val newDF = oldDF.select(col("id"), condition(keys))
Sign up to request clarification or add additional context in comments.

2 Comments

holy shit, it worked, i love u illak , just 1 more thing , it creates a table but shows score value only for those whose map key is "Tech" and null for others, can u fix it please :)
updated answer, it works only if the map only has one element (one pair key-value)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.