I am attempting to concat 2 csv files, with data df1b(2214,4) and df2b(2262, 4). A large portion of the indices in these 2 files are the same, and therefore I am looking for those rows to overlap, and where indices are unique, the other rows will be filled by NaN. Example below:
df1b
Index Col1, 2, 3
A . Data in all columns
B . Data in all columns
D . Data in all columns
E . Data in all columns
df2b
Index, ColX, Y, Z
A . Data in all columns
B . Data in all columns
C . Data in all columns
E . Data in all columns
Desired final concat:
Index, Col1, 2, 3, x, y, z,
A . Data in all columns
B . Data in all columns
C . NaN, NaN, NaN, Data, data, data
D . Data in all columns
E . Data in all columns
When I concat using: df3 = pd.concat([df1b, df2b], axis=1) The result is a file of dimension (4800, 4) where concat is not recognizing that a large portion of the indices actually are the same between the 2 files. Has anyone encountered why this might occur?
df = pd.read_csv('XX.csv')
df1 = df[['Gene', 'Young_Q1', 'Young_Q2', 'Young_Q3']]
df1a = df1.to_csv('Young_Q.csv', index=False)
df1b = pd.read_csv('Young_Q.csv', index_col='Gene', encoding='utf-8')
df2 = df[['OldQ_Gene', 'Old_Q1', 'Old_Q2', 'Old_Q3']]
df2a = df2.to_csv('Old_Q.csv', index=False)
df2b = pd.read_csv('Old_Q.csv', index_col='OldQ_Gene', encoding='utf-8')
df3 = pd.concat([df1b, df2b], axis=1)
Result example looks like:
Df3
A . NaN, NaN, NaN, Data, Data, Data
B . NaN, NaN, NaN, Data, Data, Data
D . NaN, NaN, NaN, Data, Data, Data
E . NaN, NaN, NaN, Data, Data, Data
A . Data, Data, Data, NaN, NaN, NaN
B . Data, Data, Data, NaN, NaN, NaN
C . Data, Data, Data, NaN, NaN, NaN
E . Data, Data, Data, NaN, NaN, NaN