How to combine multiple columns to single column

Question

This is the sample data provided and I need to combine the columns to single column which consists of all the fruit names without any Nan Values and also total column should'nt be changed.

|Fruit1  | Fruit2| Fruit3| Total|
|:------ | :----:| :----:|-----:|
|NaN     | Apple | NaN.  | 20.  |
|Pear    | NaN   | NaN.  | 40.  |
|NaN.    | NaN   | orange| 50.  |
|Mango   | NaN   | NaN.  | 43.  |
|NaN     | banana| NaN.  | 35.  |

This should be the output:

|Fruits  | Total|
|------- | -----|
|Apple   | 20.  |
|Pear    | 40.  |
|Orange  | 50.  |
|Mango   | 43.  |
|banana  | 35.  |

Michael Dorner · Accepted Answer · 2022-05-27 11:27:00Z

2

I would use bfill():

df = pd.DataFrame({
    'fruit_1': [None, 'Pear', None, None], 
    'fruit_2': ['Apple', None, None, None], 
    'fruit_3': [None, None, 'Orange', None]})

df.bfill(axis=1).iloc[:,0].rename('fruits') # returns

0     Apple
1      Pear
2    Orange
3      None
Name: fruits, dtype: object

(or ffill() and use the last column)

It also works for rows containing None only.

edited May 27, 2022 at 11:27

answered May 27, 2022 at 11:21

Michael Dorner

20.6k16 gold badges96 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sunderam Dubey · Accepted Answer · 2022-05-29 05:25:54Z

1

Assuming you have only one non-NaN per row, you can stack:

df.stack().droplevel(1).to_frame(name='Fruits')

Output:

   Fruits
0   Apple
1    Pear
2  Orange
3   Mango
4  banana

Handling rows with only NaNs:

df.stack().droplevel(1).to_frame(name='Fruits').reindex(df.index)

Output assuming banana is a NaN:

   Fruits
0   Apple
1    Pear
2  Orange
3   Mango
4     NaN

edited May 29, 2022 at 5:25

Sunderam Dubey

8,83312 gold badges25 silver badges43 bronze badges

answered May 27, 2022 at 11:04

mozway

267k13 gold badges56 silver badges106 bronze badges

4 Comments

Assasins creed Over a year ago

Thank you for your answer, what if we want to retain rows that all have NaN values.

mozway Over a year ago

You can reindex with the original index

Assasins creed Over a year ago

What if we have other columns other than fruits in the data frame and we don't want to change the data of that column.

mozway Over a year ago

Best IMO would be to subselect these columns, stack as I did, then join back to DataFrame

new2cod3 · Accepted Answer · 2022-05-27 10:45:16Z

0

I think this should give the desired output -

df['Fruit1'].fillna(df['Fruit2'])

answered May 27, 2022 at 10:45

new2cod3

317 bronze badges

2 Comments

new2cod3 Over a year ago

However, this is only applicable for two columns. For more you can refer to bfill()

Community Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Tim Biegeleisen · Accepted Answer · 2022-05-27 10:45:25Z

0

We can use combine_first here:

df["Fruits"] = df["Fruit1"].combine_first(df["Fruit2"])

We can also use np.where:

df["Fruits"] = np.where(df["Fruit1"].isnull(), df["Fruit2"], df["Fruit1"])

answered May 27, 2022 at 10:45

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

3 Comments

Assasins creed Over a year ago

what if there are 3 or more columns of similar format and we need to combine all 3 to 1 column

Tim Biegeleisen Over a year ago

That's not the question which was asked, nor what anyone answered. If you had three columns, I would need to see what the data looks like.

Assasins creed Over a year ago

Thanks for answering but I have updated the question. I would appreciate If you could help out.

DarrylG · Accepted Answer · 2022-05-27 11:21:36Z

We can use a modification of an approach from Transform Multiple Columns Into One With Pandas to combine columns:

df['new'] = df.fillna('').sum(1)

Explanation

replace all nan values with an empty string
sum(1), is summing the df row by row. Since values in row are strings, it will join them together

Example Usage

from io import StringIO

# Create DataFrame from OP data
s = '''Fruit1,Fruit2,Fruit3
NaN,Apple,NaN
Pear,NaN,NaN
NaN,NaN,Orange
Mango,NaN,NaN
NaN,banana,NaN'''

df = pd.read_csv(StringIO(s))
print(df)

Initial DataFrame

    Fruit1  Fruit2  Fruit3
0   NaN     Apple   NaN
1   Pear    NaN     NaN
2   NaN     NaN     Orange
3   Mango   NaN     NaN
4   NaN     banana  NaN

df['New']=df.fillna('').sum(1)
print(df)

Updated DataFrame

    Fruit1  Fruit2  Fruit3  new
0   NaN     Apple   NaN     Apple
1   Pear    NaN     NaN     Pear
2   NaN     NaN     Orange  Orange
3   Mango   NaN     NaN     Mango
4   NaN     banana  NaN     banana

Collectives™ on Stack Overflow

How to combine multiple columns to single column

5 Answers 5

Comments

Handling rows with only NaNs:

4 Comments

2 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Handling rows with only NaNs:

4 Comments

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related