I have a dataframe as follows
df:
ID color finish duration
A1 black smooth 12
A2 white matte 8
A3 blue smooth 20
A4 green matte 10
B1 black smooth 12
B2 white matte 8
B3 blue smooth
B4 green 10
C1 black smooth
C2 white matte 8
C3 blue smooth
C4 green 10
I want to generate subsets of this dataframe based on certain conditions. For example,
color= black, finish = smooth, duration = 12, I get the following dataframe.
ID color finish duration score
A1 black smooth 12 1
B1 black smooth 12 1
color= blue, finish = smooth, duration = 20, I get the following dataframe.
ID color finish duration score
A3 blue smooth 20 1
B3 blue smooth 0.666667
C3 blue smooth 0.666667
Score is calculated as number of columns populated/total number of columns. I want to loop this in pandas dataframe. Following code is working for me for 2 columns.
list2 = list(df['color'].unique())
list3 = list(df['finish'].unique())
df_final = pd.DataFrame()
for i in range(len(list2)):
for j in range(len(list3)):
print 'Current Attribute Value:',list2[i],list3[j]
gbl["df_"+list2[i]] = df[df.color == list2[i]]
gbl["df_" + list2[i] + list3[j]] =
gbl["df_"+list2[i]].loc[gbl["df_"+list2[i]].finish == list3[j]]
gbl["df_" + list2[i] + list3[j]]['dfattribval'] = list2[i] + list3[j]
df_final = df_final.append(gbl["df_" + list2[i] + list3[j]], ignore_index=True)
However, I am not able to loop this over column names. What I would like to do is,
lista = ['color','finish']
df_final = pd.DataFrame()
for a in range(len(lista)):
for i in range(len(list2)):
for j in range(len(list3)):
print 'Current Attribute Value:',lista[a],list2[i],lista[a+1],list3[j]
gbl["df_"+list2[i]] = df[df.lista[a] == list2[i]]
gbl["df_" + list2[i] + list3[j]] = gbl["df_"+list2[i]].loc[gbl["df_"+list2[i]].lista[a+1] == list3[j]]
gbl["df_" + list2[i] + list3[j]]['dfattribval'] = list2[i] + list3[j]
df_final = df_final.append(gbl["df_" + list2[i] + list3[j]], ignore_index=True)
I get the obvious error -
AttributeError: 'DataFrame' object has no attribute 'lista'.
Anyone would know how to loop over column names and values. Thanks much in advance!
number of columns populated/total number of columns. When you showcolor= blue,finish = smooth,duration = 20, you show 3 rows, two of which don't have duration 20. I'm lost as to how you need this problem solved.groupby()of color and finish can achieve yourscorewithout needing to separate dfs.