using lambda for code efficiency in iterating over dataframe

Question

I am trying to eliminate excessive if statements for modifying values in a Pandas dataframe. I will eventually have one for each state, which is a lot of code and the if statement will be performed each time for every state. When my data source is in the list format, I successfully used lambda to make the code for efficient. This is demonstrated in the first code block. I am trying to replicate it with the data in the dataframe but am not sure how.

Efficient Code with Lists:

Projects = [['Project1', 'CT', 800], ['Project2', 'MA', 1000], ['Project3', 'CA', 20]]

for project in Projects:
    project[2] = {
        'CT': lambda: [project[2] * 1.4],
        'MA': lambda: [project[2] * 1.1],
        'CA': lambda: [project[2] * 1.5]
    }[project[1]]()

print Projects

Inefficient code with dataframe:

import pandas as pd
df = pd.DataFrame(data = [['Project1', 'CT', 800], ['Project2', 'MA', 1000], ['Project3', 'CA', 20]], columns=['Project ID', 'State', 'Cost'])

for project_index, project in df.iterrows():
    if project['State'] == 'CT':
        df.ix[project_index, 'Cost'] *= 1.4
    if project['State'] == 'MA':
        df.ix[project_index, 'Cost'] *= 1.1
    if project['State'] == 'CA':
        df.ix[project_index, 'Cost'] *= 1.5

print df

Instead pf those lambdas, why not just create a dictionary for the factor, {'CT': 1.4, ...}, and call like project[2] *= factors[project[1]]? — tobias_k
– tobias_k, Commented Jul 21, 2015 at 15:02
why not just do a many-to-one merge to create a column of constants 1.4 1.1 1.5 for each state CT MA CA and do the calculation column-wise. Iterating row-by-row is a bit slower. — Jianxun Li
– Jianxun Li, Commented Jul 21, 2015 at 15:08

EdChum · Accepted Answer · 2015-07-21 15:02:07Z

2

I'd construct a dict of your states and desired multiplication factor and just iterate over the dict to get the state and cost factor tuple, use loc and the boolean mask to selectively multiply only those rows in your df:

In [185]:
d = {'CT':1.4, 'MA':1.1, 'CA':1.5}
for item in d.items():
    df.loc[df['State'] == item[0], 'Cost'] *= item[1]
df

Out[185]:
  Project ID State  Cost
0   Project1    CT  1120
1   Project2    MA  1100
2   Project3    CA    30

answered Jul 21, 2015 at 15:02

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user2242044 Over a year ago

Why would you loop through the dictionary and the not the dataframe? What if the dictionary has all 50 states, but the dataframe only has 4 projects. That seems inefficient and may even cause errrors.

Collectives™ on Stack Overflow

using lambda for code efficiency in iterating over dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related