1

I have a code that read multiple files inside the directory and every excel file have more than 10 sheet's. After that I need to exclude some sheet's every file's and the others extracted.

I got all data needed, but the problem is every sheet's from the excel created new Dataframe even I used concat so when I save it to json only the last dataframe per file saved instead of whole data.

Here's my code:

excluded_sheet = ['Sheet 2','Sheet 6']
for index, xls_path in enumerate(file_paths):                                                         
      data_file = pd.ExcelFile(xls_path)
      sheets = [ sheet for sheet in data_file.sheet_names if sheet not in excluded_sheet ]         
      for sheet_name in sheets: 
            file = xls_path.rfind(".")
            head, tail = os.path.split(xls_path[1:file])
            df =pd.concat([pd.read_excel(xls_path, sheet_name=sheet_name, header=None)],ignore_index=True)
            df.insert(loc=0, column='sheet name', value=sheet_name)

            pd.DataFrame(df.to_json(f"{json_folder_path}{tail}.json", orient='records',indent=4))

I didn't used sheet_name=None because I need to read sheet name and add to column values.

Data status of my dataFrame:

enter image description here

I got many DF because every sheet create new DF, instead of 2 DF only since I have 2 files inside the directory. Thanks guys for your help.

1 Answer 1

1

You can use list comprehension for join all sheetnames to one DataFrame:

...
...
sheets = [ sheet for sheet in data_file.sheet_names if sheet not in excluded_sheet ]

file = xls_path.rfind(".")
head, tail = os.path.split(xls_path[1:file])
dfs = [pd.read_excel(xls_path,sheet_name=sheet_name,header=None) for sheet_name in sheets]
df =pd.concat(dfs,keys=sheets)
df = df.reset_index(level=0, drop=True).rename_axis('sheet name').reset_index()

pd.DataFrame(df.to_json(f"{json_folder_path}{tail}.json", orient='records',indent=4))

Or create helper list dfs with append DataFrames per loop, outside loop use concat:

...
...
sheets = [ sheet for sheet in data_file.sheet_names if sheet not in excluded_sheet ]  

dfs = []
for sheet_name in sheets: 
    file = xls_path.rfind(".")
    head, tail = os.path.split(xls_path[1:file])
    df = pd.read_excel(xls_path, sheet_name=sheet_name, header=None)
    df.insert(loc=0, column='sheet name', value=sheet_name)
    dfs.append(df)

df1 = pd.concat(dfs,ignore_index=True)
pd.DataFrame(df1.to_json(f"{json_folder_path}{tail}.json", orient='records',indent=4))
Sign up to request clarification or add additional context in comments.

1 Comment

1st solution works for me. Thank you! you save my life.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.