I have 4 excel files - 'a1.xlsx','a2.xlsx','a3.xlsx','a4.xlsx' The format of the files are same
for eg a1.xlsx looks like:
id code name
1 100 abc
2 200 zxc
... ... ...
i have to read this files in pandas dataframe and check whether the same value of code column exists in multiple excel files or not.
something like this.
if code=100 exists in 'a1.xlsx','a3.xlsx' , and code=200 exists only in 'a1.xlsx'
final dataframe should look like:
code filename
100 a1.xlsx,a3.xlsx
200 a1.xlsx
... ....
and so on
I have all the files in a directory and tried to iterate them through loop
import pandas as pd
import os
x = next(os.walk('path/to/files/'))[2] #list all files in directory
os.chdir('path/to/files/')
for i in range (0,len(x)):
df = pd.read_excel(x[i])
How to proceed? any leads?