2

In the code snippet below I am reading a JSON file with a structure similar to this one:

{ "c7254865-87b5-4d34-a7bd-6ba6c9dbab14": "72119c87-7fce-4e17-9770-fcfab04328f5"}
{ "36c18403-1707-48c4-8f19-3b2e705007d4": "72119c87-7fce-4e17-9770-fcfab04328f5"}
{ "34a71a88-ae2d-4304-a1db-01c54fc6e4d8": "72119c87-7fce-4e17-9770-fcfab04328f5"}

Each line contains a key value pair which should be then added to a map in Scala. This is the Scala code which I used for this purpose:

val fs = org.apache.hadoop.fs.FileSystem.get(new Configuration())

def readFile(location: String): mutable.HashMap[String, String] = {
  val path: Path = new Path(location)
  val dataInputStream: FSDataInputStream = fs.open(path)
  val m = new mutable.HashMap[String, String]()
  for (line <- Source.fromInputStream(dataInputStream).getLines) {
    val parsed: Option[Any] = JSON.parseFull(line)
    m ++= parsed.get.asInstanceOf[Map[String, String]]
  }
  m
}

There must be a more elegant way to do this in Scala for sure. Especially you should be able to get rid of the mutable map and ingest the lines directly via a stream into a map. How can you do that?

2 Answers 2

3
val r: Map[String, String] = Source.fromInputStream(dataInputStream).getLines
    .map(line => JSON.parseFull(line).get)
    .flatMap { case m: Map[String, String] => m.map { case (k, v) => k -> v } }
    .toMap

Keep in mind that JSON (you mean scala.util.parsing.json.JSON, right?) is itself marked @deprecated in Scala 2.11

EDIT: as per suggestions of @SergGr and @Dima, this can be further simplified as

val r: Map[String, String] = Source.fromInputStream(dataInputStream).getLines
    .flatMap(line => JSON.parseFull(line))
    .collect { case m: Map[String, String] => m }
    .flatten.toMap

The last correction also has better handling of unexpected JSON (e.g, if an array is passed in)

Sign up to request clarification or add additional context in comments.

10 Comments

Thank you for pointing out the deprecation in Scala 2.11. Which would be the alternative?
Why do you need such a complicated flatMap with inner map? Map is already Iterable so json.getLines .flatMap(line => JSON.parseFull(line).get.asInstanceOf[Map[String, String]]) .toMap should be enough
@AlexSavitsky, Sorry for being a nitpicker, but how is the pattern match that matches only Map[String, Any] is better than asInstanceOfMap[String, Any]? If the JSON.parseFull call actually returned List[Any], your pattern match will fail as well with scala.MatchError. How is this better? IMHO pattern match is better only if you are a bit paranoid and cover there all the cases.
You don't need .get (it's kinda code smell) ... just replace the first .map with .flatMap
You can also replace second .flatMap with .collect and ..flatten to address @SergGr's concern about result having a wrong type
|
0
val json =scala.io.Source.fromString("""
 { "c7254865-87b5-4d34-a7bd-6ba6c9dbab14": "72119c87-7fce-4e17-9770-fcfab04328f5"}
 { "36c18403-1707-48c4-8f19-3b2e705007d4": "72119c87-7fce-4e17-9770-fcfab04328f5"}
 { "34a71a88-ae2d-4304-a1db-01c54fc6e4d8": "72119c87-7fce-4e17-9770-fcfab04328f5"}
 """)

Split the string, then map each entry in the Array to a key and value, and then convert to a Map. This returns a scala.collection.immutable.Map[String,String]

scala> json.map(x => x.split(":")).map(x => x(0) -> x(1)).toMap

res35: scala.collection.immutable.Map[String,String] = Map(
 { "c7254865-87b5-4d34-a7bd-6ba6c9dbab14" -> " "72119c87-7fce-4e17-9770-fcfab04328f5"}",
 { "36c18403-1707-48c4-8f19-3b2e705007d4" -> " "72119c87-7fce-4e17-9770-fcfab04328f5"}",
 { "34a71a88-ae2d-4304-a1db-01c54fc6e4d8" -> " "72119c87-7fce-4e17-9770-fcfab04328f5"}")

3 Comments

I cannot read the JSON from a string. I need to read it from a file. And the file will be around 100KB or more.
@gil.fernandes, although I prefer Alex's answer to be fair json is just a Source and it doesn't matter whether it came from file or from a string just for test.
I understand that. getLines from a DataInputStream and from a String return the same thing. It's easier to work with your example from a String. The for statement and return value m can be replaced with the line I have above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.