0

This is the sample data provided and I need to combine the columns to single column which consists of all the fruit names without any Nan Values and also total column should'nt be changed.

|Fruit1  | Fruit2| Fruit3| Total|
|:------ | :----:| :----:|-----:|
|NaN     | Apple | NaN.  | 20.  |
|Pear    | NaN   | NaN.  | 40.  |
|NaN.    | NaN   | orange| 50.  |
|Mango   | NaN   | NaN.  | 43.  |
|NaN     | banana| NaN.  | 35.  |

This should be the output:

|Fruits  | Total|
|------- | -----|
|Apple   | 20.  |
|Pear    | 40.  |
|Orange  | 50.  |
|Mango   | 43.  |
|banana  | 35.  |

5 Answers 5

2

I would use bfill():

df = pd.DataFrame({
    'fruit_1': [None, 'Pear', None, None], 
    'fruit_2': ['Apple', None, None, None], 
    'fruit_3': [None, None, 'Orange', None]})

df.bfill(axis=1).iloc[:,0].rename('fruits') # returns
0     Apple
1      Pear
2    Orange
3      None
Name: fruits, dtype: object

(or ffill() and use the last column)

It also works for rows containing None only.

Sign up to request clarification or add additional context in comments.

Comments

1

Assuming you have only one non-NaN per row, you can stack:

df.stack().droplevel(1).to_frame(name='Fruits')

Output:

   Fruits
0   Apple
1    Pear
2  Orange
3   Mango
4  banana

Handling rows with only NaNs:

df.stack().droplevel(1).to_frame(name='Fruits').reindex(df.index)

Output assuming banana is a NaN:

   Fruits
0   Apple
1    Pear
2  Orange
3   Mango
4     NaN

4 Comments

Thank you for your answer, what if we want to retain rows that all have NaN values.
You can reindex with the original index
What if we have other columns other than fruits in the data frame and we don't want to change the data of that column.
Best IMO would be to subselect these columns, stack as I did, then join back to DataFrame
0

I think this should give the desired output -

df['Fruit1'].fillna(df['Fruit2'])

2 Comments

However, this is only applicable for two columns. For more you can refer to bfill()
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
0

We can use combine_first here:

df["Fruits"] = df["Fruit1"].combine_first(df["Fruit2"])

We can also use np.where:

df["Fruits"] = np.where(df["Fruit1"].isnull(), df["Fruit2"], df["Fruit1"])

3 Comments

what if there are 3 or more columns of similar format and we need to combine all 3 to 1 column
That's not the question which was asked, nor what anyone answered. If you had three columns, I would need to see what the data looks like.
Thanks for answering but I have updated the question. I would appreciate If you could help out.
0

We can use a modification of an approach from Transform Multiple Columns Into One With Pandas to combine columns:

df['new'] = df.fillna('').sum(1) 

Explanation

  • replace all nan values with an empty string
  • sum(1), is summing the df row by row. Since values in row are strings, it will join them together

Example Usage

from io import StringIO

# Create DataFrame from OP data
s = '''Fruit1,Fruit2,Fruit3
NaN,Apple,NaN
Pear,NaN,NaN
NaN,NaN,Orange
Mango,NaN,NaN
NaN,banana,NaN'''

df = pd.read_csv(StringIO(s))
print(df)

Initial DataFrame

    Fruit1  Fruit2  Fruit3
0   NaN     Apple   NaN
1   Pear    NaN     NaN
2   NaN     NaN     Orange
3   Mango   NaN     NaN
4   NaN     banana  NaN

df['New']=df.fillna('').sum(1)
print(df)

Updated DataFrame

    Fruit1  Fruit2  Fruit3  new
0   NaN     Apple   NaN     Apple
1   Pear    NaN     NaN     Pear
2   NaN     NaN     Orange  Orange
3   Mango   NaN     NaN     Mango
4   NaN     banana  NaN     banana

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.