I am trying to eliminate excessive if statements for modifying values in a Pandas dataframe. I will eventually have one for each state, which is a lot of code and the if statement will be performed each time for every state. When my data source is in the list format, I successfully used lambda to make the code for efficient. This is demonstrated in the first code block. I am trying to replicate it with the data in the dataframe but am not sure how.
Efficient Code with Lists:
Projects = [['Project1', 'CT', 800], ['Project2', 'MA', 1000], ['Project3', 'CA', 20]]
for project in Projects:
project[2] = {
'CT': lambda: [project[2] * 1.4],
'MA': lambda: [project[2] * 1.1],
'CA': lambda: [project[2] * 1.5]
}[project[1]]()
print Projects
Inefficient code with dataframe:
import pandas as pd
df = pd.DataFrame(data = [['Project1', 'CT', 800], ['Project2', 'MA', 1000], ['Project3', 'CA', 20]], columns=['Project ID', 'State', 'Cost'])
for project_index, project in df.iterrows():
if project['State'] == 'CT':
df.ix[project_index, 'Cost'] *= 1.4
if project['State'] == 'MA':
df.ix[project_index, 'Cost'] *= 1.1
if project['State'] == 'CA':
df.ix[project_index, 'Cost'] *= 1.5
print df
{'CT': 1.4, ...}, and call likeproject[2] *= factors[project[1]]?1.4 1.1 1.5for each stateCT MA CAand do the calculation column-wise. Iterating row-by-row is a bit slower.