reading json data through Apache Spark

Question

i am trying to read sample Json file through Apache Spark, during this process i observed one thing is that you need to keep entire json object into single line. If i keep entire json object into single line,code is working well otherwise getting exception.

This is my json data:

    [
    {
        "id": 2,
        "name": "An ice sculpture",
        "price": 12.50,
        "tags": ["cold", "ice"],
        "dimensions": {
            "length": 7.0,
            "width": 12.0,
            "height": 9.5
        },
        "warehouseLocation": {
            "latitude": -78.75,
            "longitude": 20.4
        }
    },
    {
        "id": 3,
        "name": "A blue mouse",
        "price": 25.50,
        "dimensions": {
            "length": 3.1,
            "width": 1.0,
            "height": 1.0
        },
        "warehouseLocation": {
            "latitude": 54.4,
            "longitude": -32.7
        }
    }
]

This is my code:

SparkSession session = new SparkSession.Builder().appName("JsonRead").master("local").getOrCreate();
        Dataset<Row> json = session.read().json("/Users/mac/Desktop/a.json");
        json.select("tags").show();

In case of small datasets its okay, is any other way to process large json datasets?

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'tags' given input columns: [_corrupt_record]; — ROOT
– ROOT, Commented Nov 14, 2016 at 11:14

Yaron · Accepted Answer · 2016-11-14 11:15:52Z

2

see the document: http://spark.apache.org/docs/2.0.1/sql-programming-guide.html#json-datasets

JSON Datasets

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.

answered Nov 14, 2016 at 11:15

Yaron

10.6k9 gold badges50 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

reading json data through Apache Spark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related