0

I have a web server which returns JSON data that I would like to load into an Apache Spark DataFrame. Right now I have a shell script that uses wget to write the JSON data to file and then runs a Java program that looks something like this:

DataFrame df = sqlContext.read().json("example.json");

I have looked at the Apache Spark documentation and there doesn't seem a way to automatically join these two steps together. There must be a way of requesting JSON data in Java, storing it as an object and then converting it to a DataFrame, but I haven't been able to figure it out. Can anyone help?

1 Answer 1

1

You could store JSON data into a list of Strings like:

final String JSON_STR0 = "{\"name\":\"0\",\"address\":{\"city\":\"0\",\"region\":\"0\"}}";
final String JSON_STR1 = "{\"name\":\"1\",\"address\":{\"city\":\"1\",\"region\":\"1\"}}";
List<String> jsons = Arrays.asList(JSON_STR0, JSON_STR1);

where each String represents a JSON object.

Then you could transform the list to an RDD:

RDD<String> jsonRDD = sc.parallelize(jsons);

Once you've got RDD, it's easy to have DataFrame:

DataFrame data = sqlContext.read().json(jsonRDD);
Sign up to request clarification or add additional context in comments.

2 Comments

OK this works (apologies for my previous comment, which I have deleted). I used this answer in combination with stackoverflow.com/questions/2586975/how-to-use-curl-in-java. I guess what I find a bit confusing is how this works. I would expect the json method to only expect a file directory. Also this method seems a bit memory heavy for very large JSON files as you are constantly recopying the data (http->Java String->RDD->DataFrame) instead of just loading it from file. I'm wandering if Spark has some sort of JSON-over-REST way of talking to a datasource instead.
You are right, loading data into objects take huge memory consumption. I'm looking for the JSON-over-REST solution too. I'll be back once i find the solution

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.