Find if a column value exists in multiple dataframes

Question

I have 4 excel files - 'a1.xlsx','a2.xlsx','a3.xlsx','a4.xlsx' The format of the files are same

for eg a1.xlsx looks like:

id    code    name
1      100    abc
2      200    zxc
...    ...    ...

i have to read this files in pandas dataframe and check whether the same value of code column exists in multiple excel files or not.

something like this.

if code=100 exists in 'a1.xlsx','a3.xlsx' , and code=200 exists only in 'a1.xlsx'

final dataframe should look like:

code    filename
100   a1.xlsx,a3.xlsx
200   a1.xlsx
...   ....
and so on

I have all the files in a directory and tried to iterate them through loop

import pandas as pd
import os
x = next(os.walk('path/to/files/'))[2]  #list all files in directory
os.chdir('path/to/files/')

for i in range (0,len(x)):
    df = pd.read_excel(x[i])

How to proceed? any leads?

jezrael · Accepted Answer · 2017-10-06 06:00:51Z

3

Use:

import glob 

#get all filenames 
files = glob.glob('path/to/files/*.xlsx')
#list comprehension with assign new column for filenames
dfs = [pd.read_excel(fp).assign(filename=os.path.basename(fp).split('.')[0]) for fp in files]
#one big df from list of dfs
df = pd.concat(dfs, ignore_index=True)
#join all same codes
df1 = df.groupby('code')['filename'].apply(', '.join).reset_index()

edited Oct 6, 2017 at 6:00

answered Oct 6, 2017 at 5:58

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Find if a column value exists in multiple dataframes

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related