1

I am trying to get all the json files stored in a single container in a subfolder in blob storage. I have setup the environment in databricks and have the connection linked. Currently I am using this code

df = spark.read.json("wasbs://container_name@blob_storage_account.blob.core.windows.net/sub_folder/*.json")

but I am getting just the first file and not all the json files present in the subfolder even after including the wildcard /*.json.

I am trying to get all the files from the subfolder in a single dataframe and store as a table in sql database.

Can someone assist on what I am missing.

4
  • it looks good to me. How do you know it read only one file? Commented Nov 3, 2021 at 2:17
  • @pltc because it's showing just the first file data when I use df.display(). Is there better way to check if I have the data for all the files? Commented Nov 3, 2021 at 2:21
  • huh, display only shows an limited amount of data. Did you try querying the data? Commented Nov 3, 2021 at 2:29
  • databricks only display first 1000 records. You should counting instead Commented Nov 3, 2021 at 6:15

1 Answer 1

1

I have tested in my environment.

I have 3 json blob files inside the subfolder of my container in storage account. I am able to read all the blob json files in a single data frame

enter image description here

You can use the below code to display all json the files from the subfolder in a single data frame

df = spark.read.json("wasbs://container_name@blob_storage_account.blob.core.windows.net/sub_folder/*.json")
df.show()

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.