Concat does not recognize shared indices across columns being concatenated

Question

I am attempting to concat 2 csv files, with data df1b(2214,4) and df2b(2262, 4). A large portion of the indices in these 2 files are the same, and therefore I am looking for those rows to overlap, and where indices are unique, the other rows will be filled by NaN. Example below:

df1b

Index Col1,  2,  3     
A .      Data in all columns     
B .      Data in all columns      
D .      Data in all columns      
E .      Data in all columns

df2b

Index, ColX, Y, Z

A .      Data in all columns     
B .      Data in all columns      
C .      Data in all columns      
E .      Data in all columns

Desired final concat:

Index, Col1, 2, 3, x, y, z,

A . Data in all columns

B . Data in all columns

C . NaN, NaN, NaN, Data, data, data 

D . Data in all columns

E . Data in all columns

When I concat using: df3 = pd.concat([df1b, df2b], axis=1) The result is a file of dimension (4800, 4) where concat is not recognizing that a large portion of the indices actually are the same between the 2 files. Has anyone encountered why this might occur?

df = pd.read_csv('XX.csv')

df1 = df[['Gene', 'Young_Q1', 'Young_Q2', 'Young_Q3']]

df1a = df1.to_csv('Young_Q.csv', index=False)

df1b = pd.read_csv('Young_Q.csv', index_col='Gene', encoding='utf-8')

df2 = df[['OldQ_Gene', 'Old_Q1', 'Old_Q2', 'Old_Q3']]

df2a = df2.to_csv('Old_Q.csv', index=False)

df2b = pd.read_csv('Old_Q.csv', index_col='OldQ_Gene', encoding='utf-8')


df3 = pd.concat([df1b, df2b], axis=1)

Result example looks like:

Df3

A .  NaN, NaN, NaN,  Data, Data, Data

B .  NaN, NaN, NaN,  Data, Data, Data 

D .  NaN, NaN, NaN,  Data, Data, Data 

E .  NaN, NaN, NaN,  Data, Data, Data 

A .  Data, Data, Data, NaN, NaN, NaN 

B .  Data, Data, Data, NaN, NaN, NaN  

C .  Data, Data, Data, NaN, NaN, NaN  

E .  Data, Data, Data, NaN, NaN, NaN

vctrd · Accepted Answer · 2019-02-14 20:57:24Z

1

You could use merging:

df3 = df1b.merge(df2b, on='Gene', how='outer)

You will only need to consider the Gene as a normal column

more information here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

answered Feb 14, 2019 at 20:57

vctrd

5185 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Concat does not recognize shared indices across columns being concatenated

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related