0

I am looking a code snippet to find the best practice to read multiple nested JSON files under sub directories in hadoop using scala .

If we can write into one single file in some other directory in hadoop the above JSON files , that would be even better .

Any help is appreciated.

Thanks PG

5
  • : are you using Spark with Scala API or how you are using Scala in Hadoop? Commented Sep 29, 2016 at 6:44
  • Thanks for your response. I am using spark with scala API . Commented Sep 29, 2016 at 10:36
  • 1
    You can use sqlContext.read.json("json file path") to read json file, it returns an DataFrame. But you said nested directories, is the json files are having different schemas? Commented Sep 29, 2016 at 14:38
  • Thanks Shankar . Files will be of similar schemas , and I guess it worked to read the files. Now next step is can I write all the files into one single json file may be in 1-2 steps to be performance efficient. Commented Sep 29, 2016 at 20:20
  • Take a look here. I think the top answer may help: stackoverflow.com/questions/28203217/… Commented Sep 29, 2016 at 23:31

1 Answer 1

0

You can use sqlContext.read.json("input file path") to read json file, it returns an DataFrame.

Once you got the DataFrame, just use df.write.json("output file path") to write the DF as json file.

Code example: if you use Spark 2.0

val spark = SparkSession
      .builder()
      .appName("Spark SQL JSON example")
      .getOrCreate()

      val df = spark.read.json("input/file/path")

      df.write.json("output/file/path")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.