0

thanks for your time.

I need to read several file paths, which are divided into months and days (/mm/dd/*.json)

I've been trying to traverse the path associated with days, but my loop always sticks with the last read:

for i_dia in range(1, 9):
  df_json = spark.read.json('/mnt/datalake/'+Year+'/'+ Month +'/'+ str(0) + str(i_dia) +'/'+ '*', mode="PERMISSIVE",multiLine = "true")
  return df_json
 
display(df_json)

How should the correct reading be done? I want to read all files in only one big dataframe please.

From already thank you very much.

Regards

4
  • but my loop always sticks with the last read Can you clarify this part? What's going wrong? PS: Python range is not inclusive, so if you do range(1, 9) you will get 1 through 8. This may be the cause of your problem. Commented Mar 21, 2022 at 18:46
  • you are returning after the first file the way you indented this piece of code. So you only read one JSON file. That's clearly not what you want, but what do you want? Read all files and then? Append them all to one big dataframe? Please provide more information. Commented Mar 21, 2022 at 18:50
  • Thanks you for respond. I want to read all files in only one big dataframe please. Commented Mar 21, 2022 at 19:03
  • 1
    You need to use pd.concat() in order to achieve that. Commented Mar 21, 2022 at 19:33

1 Answer 1

2
import pandas as pd
df_json=pd.DataFrame()
for i_dia in range(1, 9):
        df_json= pd.concat([df_json,pd.read_json(i_dia )])
Sign up to request clarification or add additional context in comments.

1 Comment

Sorry for the late response, I was traveling. I managed to understand your logic and applied it to my need, thank you very much!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.