0

I am puzzled with the following behavior of pd.concat().

For a reproducible example:

print(dff1)
   R1        R2        R3        R4        R5        R6        R7  \
0  0.471973  0.520696  0.002054  0.003792  0.468673  0.689152  0.832767   
1  0.692504  0.681589  0.006030  0.018702  0.329242  0.493711  0.510571   
2  0.456881  0.554278  0.003793  0.003721  0.539799  0.630306  0.649009   
3  0.622189  0.597208  0.001313  0.001932  0.242561  0.529820  0.467826   
4  0.605414  0.678142  0.041302  0.037944  0.386975  0.856687  0.419201   

         R8  
0  0.155941  
1  0.863021  
2  0.790688  
3  0.872659  
4  0.037814  

print(dff2)
 Id
22275  29668
11689  15503
54894  73108
19839  26429
55252  73574

dff3 = pd.concat([dff1, dff2], axis = 1, ignore_index = True)
print(dff3)
  0         1         2         3         4         5         6  \
0      0.471973  0.520696  0.002054  0.003792  0.468673  0.689152  0.832767   
1      0.692504  0.681589  0.006030  0.018702  0.329242  0.493711  0.510571   
2      0.456881  0.554278  0.003793  0.003721  0.539799  0.630306  0.649009   
3      0.622189  0.597208  0.001313  0.001932  0.242561  0.529820  0.467826   
4      0.605414  0.678142  0.041302  0.037944  0.386975  0.856687  0.419201   
11689       NaN       NaN       NaN       NaN       NaN       NaN       NaN   
19839       NaN       NaN       NaN       NaN       NaN       NaN       NaN   
22275       NaN       NaN       NaN       NaN       NaN       NaN       NaN   
54894       NaN       NaN       NaN       NaN       NaN       NaN       NaN   
55252       NaN       NaN       NaN       NaN       NaN       NaN       NaN   

              7        8  
0      0.155941      NaN  
1      0.863021      NaN  
2      0.790688      NaN  
3      0.872659      NaN  
4      0.037814      NaN  
11689       NaN  15503.0  
19839       NaN  26429.0  
22275       NaN  29668.0  
54894       NaN  73108.0  
55252       NaN  73574.0  

So it seems that the ignore_index argument is ignored but I do not understand why.

0

1 Answer 1

1

No, you need reset_index with drop=True for default indices of input DataFrames:

dff3 = pd.concat([dff1.reset_index(drop=True),
                  dff2.reset_index(drop=True)], axis = 1, ignore_index = True)

Parameter ignore_index is for default indices of columns if axis=1 or of indices if axis=0 of output DataFrame.

But another axis, indices for axis=1 are aligned, so you get NaNs. If use axis=0 there are aligned columns.

Sample:

dff3 = pd.concat([dff1.reset_index(drop=True),
                  dff2.reset_index(drop=True)], axis = 1, ignore_index = True)
print (dff3)
          0         1         2         3         4         5         6  \
0  0.471973  0.520696  0.002054  0.003792  0.468673  0.689152  0.832767   
1  0.692504  0.681589  0.006030  0.018702  0.329242  0.493711  0.510571   
2  0.456881  0.554278  0.003793  0.003721  0.539799  0.630306  0.649009   
3  0.622189  0.597208  0.001313  0.001932  0.242561  0.529820  0.467826   
4  0.605414  0.678142  0.041302  0.037944  0.386975  0.856687  0.419201   

          7      8  
0  0.155941  29668  
1  0.863021  15503  
2  0.790688  73108  
3  0.872659  26429  
4  0.037814  73574  

If omit parameter ignore_index = True:

dff3 = pd.concat([dff1.reset_index(drop=True),
                  dff2.reset_index(drop=True)], axis = 1)
print (dff3)
         R1        R2        R3        R4        R5        R6        R7  \
0  0.471973  0.520696  0.002054  0.003792  0.468673  0.689152  0.832767   
1  0.692504  0.681589  0.006030  0.018702  0.329242  0.493711  0.510571   
2  0.456881  0.554278  0.003793  0.003721  0.539799  0.630306  0.649009   
3  0.622189  0.597208  0.001313  0.001932  0.242561  0.529820  0.467826   
4  0.605414  0.678142  0.041302  0.037944  0.386975  0.856687  0.419201   

         R8     Id  
0  0.155941  29668  
1  0.863021  15503  
2  0.790688  73108  
3  0.872659  26429  
4  0.037814  73574  

If concat by index (axis=0 is default parameter, so is omited):

dff3 = pd.concat([dff1, dff2], ignore_index = True)
print (dff3)
        Id        R1        R2        R3        R4        R5        R6  \
0      NaN  0.471973  0.520696  0.002054  0.003792  0.468673  0.689152   
1      NaN  0.692504  0.681589  0.006030  0.018702  0.329242  0.493711   
2      NaN  0.456881  0.554278  0.003793  0.003721  0.539799  0.630306   
3      NaN  0.622189  0.597208  0.001313  0.001932  0.242561  0.529820   
4      NaN  0.605414  0.678142  0.041302  0.037944  0.386975  0.856687   
5  29668.0       NaN       NaN       NaN       NaN       NaN       NaN   
6  15503.0       NaN       NaN       NaN       NaN       NaN       NaN   
7  73108.0       NaN       NaN       NaN       NaN       NaN       NaN   
8  26429.0       NaN       NaN       NaN       NaN       NaN       NaN   
9  73574.0       NaN       NaN       NaN       NaN       NaN       NaN   

         R7        R8  
0  0.832767  0.155941  
1  0.510571  0.863021  
2  0.649009  0.790688  
3  0.467826  0.872659  
4  0.419201  0.037814  
5       NaN       NaN  
6       NaN       NaN  
7       NaN       NaN  
8       NaN       NaN  
9       NaN       NaN  

Aligned by columns, indices are not changed, because no parameter ignore_index = True:

dff3 = pd.concat([dff1, dff2])
print (dff3)
            Id        R1        R2        R3        R4        R5        R6  \
0          NaN  0.471973  0.520696  0.002054  0.003792  0.468673  0.689152   
1          NaN  0.692504  0.681589  0.006030  0.018702  0.329242  0.493711   
2          NaN  0.456881  0.554278  0.003793  0.003721  0.539799  0.630306   
3          NaN  0.622189  0.597208  0.001313  0.001932  0.242561  0.529820   
4          NaN  0.605414  0.678142  0.041302  0.037944  0.386975  0.856687   
22275  29668.0       NaN       NaN       NaN       NaN       NaN       NaN   
11689  15503.0       NaN       NaN       NaN       NaN       NaN       NaN   
54894  73108.0       NaN       NaN       NaN       NaN       NaN       NaN   
19839  26429.0       NaN       NaN       NaN       NaN       NaN       NaN   
55252  73574.0       NaN       NaN       NaN       NaN       NaN       NaN   

             R7        R8  
0      0.832767  0.155941  
1      0.510571  0.863021  
2      0.649009  0.790688  
3      0.467826  0.872659  
4      0.419201  0.037814  
22275       NaN       NaN  
11689       NaN       NaN  
54894       NaN       NaN  
19839       NaN       NaN  
55252       NaN       NaN  
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.