I'm trying to read an in-memory JSON string into a Spark DataFrame on the fly:
var someJSON : String = getJSONSomehow()
val someDF : DataFrame = magic.convert(someJSON)
I've spent quite a bit of time looking at the Spark API, and the best I can find is to use a sqlContext like so:
var someJSON : String = getJSONSomehow()
val tmpFile : Output = Resource
.fromFile(s"/tmp/json/${UUID.randomUUID().toString()}")
tmpFile.write("hello")(Codec.UTF8)
val someDF : DataFrame = sqlContext.read().json(tmpFile)
But this feels kind of awkward/wonky and imposes the following constraints:
- It requires me to format my JSON to one object per line (per documentation); and
- It forces me to write the JSON to a temp file, which is slow and awkward; and
- It forces me to clean up temp files over time, which is cumbersome and feels "wrong" to me
So I ask: Is there a direct and more efficient way to convert a JSON string into a Spark DataFrame?