1

I'm getting a stream of tweets in JSON format using a Python notebook on Apache Spark on IBM Bluemix. I want to turn the JSON object into a dataframe. The closest thing i have found is:

json_obj = {"tweet": "hey man", "name": "Alan"} #mock example

tweetsDF = sqlContext.read.json("file path")

Now, I don't wanna load a file. I just wanna take that JSON variable and convert it into dataframe format. That way i can apply it to the tweets variable. Any idea how to do this? Thank you!

1 Answer 1

6

First convert the JSON object into an RDD[String], then apply `sqlContext.read.json'. Here is example code in Scala

val json_obj = sc.parallelize(Array("""{"tweet": "hey man", "name": "Alan"}""", """{"tweet": "what's up", "name": "Bertha"}"""))
val tweetsDF = sqlContext.read.json(json_obj)
tweetsDF.show()
//+------+---------+
//|  name|    tweet|
//+------+---------+
//|  Alan|  hey man|
//|Bertha|what's up|
//+------+---------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.