1

I'm trying to create a new column in a dataframe of food ingredients with unique values per row based on information from other cells in the same row.

The table essentially looks like this:

ingredient_name | ingredient_method | consolidated_name
Cheese          | [camembert, pkg]  | 
Cheese          | [cream, pastueri] |
Egg             | [raw, scrambled]  |

I'm trying to iterate through the rows and fill the consolidated_name column with values from either ingredient_name or ingredient_method.
For example, if ingredient_name is "Cheese" I want that row's consolidated name to be the first element of the list in ingredient_method.

This is the code I have so far:

for i, row in df.iterrows():
    consolidated = df['ingredient_name']
    if (df['ingredient_name'] == 'Cheese').all():
        consolidated = df['ingredient_method'][0]
    df.set_value(i,'consolidated_name',consolidated)

The code runs without errors but none of the values change in the dataframe.
Any ideas?

2
  • Can you add expected output? What happens in last row? Commented Mar 7, 2018 at 13:47
  • You are not using the i's and row's in your code. Further, it seems like set_value method is not an in-place operation so your df will not change at all. Commented Mar 7, 2018 at 13:48

3 Answers 3

2

One could use .loc (combined to .str[0])

With:

df = pd.DataFrame(dict(ingredient_name=['Cheese','Cheese','Egg'],
                  ingredient_method=[['camembert', 'pkg'],
                                     ['cream', 'pastueri'],
                                     ['raw', 'scrambled']]))

Do:

#Initialize consolidated_name with None for instance
df['consolidated_name'] = [None]*len(df) #Not mandatory, will fill with NaN if not set

#Use .loc to get the rows you want and .str[0] to get the first elements
_filter = df.ingredient_name=='Cheese' #Filter you want to
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]

Result:

print(df)
   ingredient_method ingredient_name consolidated_name
0   [camembert, pkg]          Cheese         camembert
1  [cream, pastueri]          Cheese             cream
2   [raw, scrambled]             Egg              None

Note

#1
If you want to consolidate all the duplicated ingredients you can filter with the following:

_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)

The use of .loc is unchanged see next example:

df = pd.DataFrame(dict(ingredient_name=['Cheese','Cheese','Egg','Foo','Foo'],
                  ingredient_method=[['camembert', 'pkg'], 
                                     ['cream', 'pastueri'], 
                                     ['raw', 'scrambled'], 
                                     ['bar', 'taz'], 
                                     ['taz', 'bar']]))

_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]
print(df)

   ingredient_method ingredient_name consolidated_name
0   [camembert, pkg]          Cheese         camembert
1  [cream, pastueri]          Cheese             cream
2   [raw, scrambled]             Egg               NaN
3         [bar, taz]             Foo               bar
4         [taz, bar]             Foo               taz

#2
If you want you can initialize with ingredient_name:

df['consolidated_name'] = df.ingredient_name

Then do your stuff:

_duplicated = df.ingredient_name[df.ingredient_name.duplicated()]
_filter = df.ingredient_name.isin(_duplicated)
df.loc[_filter,'consolidated_name'] = df.loc[_filter,'ingredient_method'].str[0]
print(df)

   ingredient_method ingredient_name consolidated_name
0   [camembert, pkg]          Cheese         camembert
1  [cream, pastueri]          Cheese             cream
2   [raw, scrambled]             Egg               Egg #Here it has changed
3         [bar, taz]             Foo               bar
4         [taz, bar]             Foo               taz
Sign up to request clarification or add additional context in comments.

4 Comments

Maybe df['consolidated_name'] = [None]*len(df) should be omit.
Yes, it fills with NaN instead. (Updated my answer)
Thanks David — this worked but I'm accepting the answer below because the for loop framework lets me set logic for multiple ingredients beyond cheese. The framework you shared seems to only let me do one filter at a time unless I'm missing something?
Maybe you could provide such example to see if it fits? (In my opinion it should)
1

You can use DataFrame.apply for that purpose. Simply wrap your decision logic (which is now in the for loop) into a corresponding function.

def func(row):
    if row['ingredient_name'] == 'Cheese':
        return row['ingredient_method'][0]
    return None

df['consolidated_name'] = df.apply(func, axis=1)

Comments

0

If you want do it using your initial loop.

consolidated_name = []
for i,row in df.iterrows():
    if row[0] =='Cheese':
        consolidated_name.append(row[1][0])
    else: consolidated_name.append(None)

df['consolidated_name']=consolidated_name

## out:
  ingredient_name  ingredient_method consolidated_name
0          Cheese   [camembert, pkg]         camembert
1          Cheese  [cream, pastueri]             cream
2             Egg   [raw, scrambled]              None

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.