Combine Multiple Data Columns in Pandas

Question

I have the following pandas dataframe -

df = 
    1.0         2.0         3.0             4.0         5.0
(1083, 596)                             (1050, 164)   (1050, 164)   
(1081, 595)                             (1050, 164)   (1080, 162)
(1081, 594)                             (1049, 163)   (1070, 164)
(1082, 593) 
            (1050, 164)     
            (1050, 164)     
            (1049, 163)     
            (1049, 163)     

                        (1052, 463)
                        (1051, 468)
                        (1054, 465)
                        (1057, 463)

I need a completely new dataframe, df2, with 3 columns: 1.0, 2.0 (combines 2.0 and 4.0) and 3.0 (combines 3.0 and 5.0).

The result will be -

df2 = 
    1.0         2.0         3.0     
(1083, 596) (1050, 164)   (1050, 164)   
(1081, 595) (1050, 164)   (1080, 162)
(1081, 594) (1049, 163)   (1070, 164)
(1082, 593) 
            (1050, 164)     
            (1050, 164)     
            (1049, 163)     
            (1049, 163)     

                        (1052, 463)
                        (1051, 468)
                        (1054, 465)
                        (1057, 463)

You can expect that there will be no overlapping values in the merged columns; if one column has valid value in a row then others will have NaN value.

I tried -

df.fillna(0)
df2['2.0']=df['2.0']+df['4.0']

and it does not work as intended. Is there any simple and efficient method of doing this?

Akaisteph7 · Accepted Answer · 2019-07-09 18:11:45Z

1

You can use DataFrame.where() and DataFrame.isnull() to mix the values the way you are trying to:

df2 = pd.DataFrame(df["1.0"], columns=["1.0"])
df2["2.0"] = df["2.0"].where(~df2["2.0"].isnull(), df2["4.0"])
df2["3.0"] = df["3.0"].where(~df2["3.0"].isnull(), df2["5.0"])

answered Jul 9, 2019 at 18:11

Akaisteph7

6,9222 gold badges39 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

SCool · Accepted Answer · 2019-07-09 18:32:06Z

Just basically copying and pasting. I think this works.

# copy values over to your other columns
# note: [0:3,'2.0'] gets the first 4 rows (index 0 to 3) of column '2.0'
# then you set it equal to the first 4 rows of column '4.0'

df.loc[0:3,'2.0'] = df.loc[0:3,'4.0'] 
df.loc[0:3,'3.0'] = df.loc[0:3,'5.0'] 


# just get the three columns you need


df2 = df[['1.0','2.0','3.0']]


           1.0          2.0          3.0
0   (1083, 596)  (1050, 164)  (1050, 164)
1   (1081, 595)  (1050, 164)  (1080, 162)
2   (1081, 594)  (1049, 163)  (1070, 164)
3   (1082, 593)          NaN          NaN
4           NaN  (1050, 164)          NaN
5           NaN  (1050, 164)          NaN
6           NaN  (1049, 163)          NaN
7           NaN  (1049, 163)          NaN
8           NaN          NaN          NaN
9           NaN          NaN  (1052, 463)
10          NaN          NaN  (1051, 468)
11          NaN          NaN  (1054, 465)
12          NaN          NaN  (1057, 463)

If your column names are actually floats, remove the quotation marks from these sections: df.loc[0:3,'2.0'] e.g. change to df.loc[0:3,2.0] like:

df.loc[0:3,2.0] = df.loc[0:3,4.0] 
df.loc[0:3,3.0] = df.loc[0:3,5.0]

Andy L. · Accepted Answer · 2019-07-09 19:24:14Z

Assume whitespaces in df are NaNs. You only need shift columns '2.0, 3.0, 4.0, 5.0' left 2 positions and do combine_first with df. Finally, pick first 3 columns by using iloc

df2 = df.combine_first(df.drop('1.0',1).shift(-2, axis=1)).iloc[:,:3]

Out[297]:
           1.0         2.0         3.0
0   (1083, 596)  (1050, 164)  (1050, 164)
1   (1081, 595)  (1050, 164)  (1080, 162)
2   (1081, 594)  (1049, 163)  (1070, 164)
3   (1082, 593)         NaN         NaN
4          NaN  (1050, 164)         NaN
5          NaN  (1050, 164)         NaN
6          NaN  (1049, 163)         NaN
7          NaN  (1049, 163)         NaN
8          NaN         NaN  (1052, 463)
9          NaN         NaN  (1051, 468)
10         NaN         NaN  (1054, 465)
11         NaN         NaN  (1057, 463)

Collectives™ on Stack Overflow

Combine Multiple Data Columns in Pandas

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related