Python - How to iterate through a dataframe and replace a value in one cell with a value from another in the same row

Question

I'm trying to create a new column in a dataframe of food ingredients with unique values per row based on information from other cells in the same row.

The table essentially looks like this:

ingredient_name | ingredient_method | consolidated_name
Cheese          | [camembert, pkg]  | 
Cheese          | [cream, pastueri] |
Egg             | [raw, scrambled]  |

I'm trying to iterate through the rows and fill the consolidated_name column with values from either ingredient_name or ingredient_method.
For example, if ingredient_name is "Cheese" I want that row's consolidated name to be the first element of the list in ingredient_method.

This is the code I have so far:

for i, row in df.iterrows():
    consolidated = df['ingredient_name']
    if (df['ingredient_name'] == 'Cheese').all():
        consolidated = df['ingredient_method'][0]
    df.set_value(i,'consolidated_name',consolidated)

The code runs without errors but none of the values change in the dataframe.
Any ideas?

You are not using the i's and row's in your code. Further, it seems like set_value method is not an in-place operation so your df will not change at all. — Zhiya
– Zhiya, Commented Mar 7, 2018 at 13:48

David Leon · Accepted Answer · 2018-03-07 14:49:43Z

2

One could use .loc (combined to .str[0])

With:

df = pd.DataFrame(dict(ingredient_name=['Cheese','Cheese','Egg'],
                  ingredient_method=[['camembert', 'pkg'],
                                     ['cream', 'pastueri'],
                                     ['raw', 'scrambled']]))

Do:

#Initialize consolidated_name with None for instance
df['consolidated_name'] = [None]*len(df) #Not mandatory, will fill with NaN if not set

#Use .loc to get the rows you want and .str[0] to get the first elements
_filter = df.ingredient_name=='Cheese' #Filter you want to
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]

Result:

print(df)
   ingredient_method ingredient_name consolidated_name
0   [camembert, pkg]          Cheese         camembert
1  [cream, pastueri]          Cheese             cream
2   [raw, scrambled]             Egg              None

Note

#1
If you want to consolidate all the duplicated ingredients you can filter with the following:

_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)

The use of .loc is unchanged see next example:

df = pd.DataFrame(dict(ingredient_name=['Cheese','Cheese','Egg','Foo','Foo'],
                  ingredient_method=[['camembert', 'pkg'], 
                                     ['cream', 'pastueri'], 
                                     ['raw', 'scrambled'], 
                                     ['bar', 'taz'], 
                                     ['taz', 'bar']]))

_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]
print(df)

   ingredient_method ingredient_name consolidated_name
0   [camembert, pkg]          Cheese         camembert
1  [cream, pastueri]          Cheese             cream
2   [raw, scrambled]             Egg               NaN
3         [bar, taz]             Foo               bar
4         [taz, bar]             Foo               taz

#2
If you want you can initialize with ingredient_name:

df['consolidated_name'] = df.ingredient_name

Then do your stuff:

_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]
print(df)

   ingredient_method ingredient_name consolidated_name
0   [camembert, pkg]          Cheese         camembert
1  [cream, pastueri]          Cheese             cream
2   [raw, scrambled]             Egg               Egg #Here it has changed
3         [bar, taz]             Foo               bar
4         [taz, bar]             Foo               taz

edited Mar 7, 2018 at 14:49

answered Mar 7, 2018 at 14:01

David Leon

1,0179 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jezrael Over a year ago

Maybe df['consolidated_name'] = [None]*len(df) should be omit.

David Leon Over a year ago

Yes, it fills with NaN instead. (Updated my answer)

Nate Gosselin Over a year ago

Thanks David — this worked but I'm accepting the answer below because the for loop framework lets me set logic for multiple ingredients beyond cheese. The framework you shared seems to only let me do one filter at a time unless I'm missing something?

David Leon Over a year ago

Maybe you could provide such example to see if it fits? (In my opinion it should)

a_guest · Accepted Answer · 2018-03-07 13:47:08Z

1

You can use DataFrame.apply for that purpose. Simply wrap your decision logic (which is now in the for loop) into a corresponding function.

def func(row):
    if row['ingredient_name'] == 'Cheese':
        return row['ingredient_method'][0]
    return None

df['consolidated_name'] = df.apply(func, axis=1)

answered Mar 7, 2018 at 13:47

a_guest

36.7k15 gold badges75 silver badges137 bronze badges

Comments

Julio CamPlaz · Accepted Answer · 2018-03-07 13:55:35Z

0

If you want do it using your initial loop.

consolidated_name = []
for i,row in df.iterrows():
    if row[0] =='Cheese':
        consolidated_name.append(row[1][0])
    else: consolidated_name.append(None)

df['consolidated_name']=consolidated_name

## out:
  ingredient_name  ingredient_method consolidated_name
0          Cheese   [camembert, pkg]         camembert
1          Cheese  [cream, pastueri]             cream
2             Egg   [raw, scrambled]              None

answered Mar 7, 2018 at 13:55

Julio CamPlaz

9178 silver badges20 bronze badges

Collectives™ on Stack Overflow

Python - How to iterate through a dataframe and replace a value in one cell with a value from another in the same row

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related