I have a list of csv files. Each file has 5 columns, with ‘id’ as the only common column (primary key). The rest 4 columns are all different.
My point of interest is the 5th (last) column, which is different for each file. I want to merge them on ‘id’.
I have tried the following code but it concatenates row wise, giving me too many duplicate ‘id’ as well as ‘NaN’ values:
filelist = glob.glob(path + "/*.csv")
li = []
for filename in filelist:
df = pd.read_csv(filename, index_col=None, header=0, usecols=[0,5])
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
I wanna concatenate them column wise with my point-of-interest column (5th column).
For example:
My list of files: ['df1.csv', 'df2.csv', 'df3.csv', 'df4.csv']
df1.csv has the following structure:
ID No1 AA
0 1 0 4
1 2 1 5
2 3 0 6
df2.csv has this structure:
ID No2 BB
0 2 0 5
1 3 1 6
2 4 0 7
The list goes on. My desired output would be:
ID AA BB CC DD
0 1 4.0 NaN 0 1
1 2 5.0 5.0 1 0
2 3 6.0 6.0 1 0
3 4 NaN 7.0 1 1
Any suggestions would be appreciated. Thank you.
idas the index column. then run apd.joinon all the dataframes. or use the first dataframe as the left dataframe, and the other dataframes as the right and merge on theidindex