Not a an ideal title but I wouldn't know how to describe it better.
I have a dataframe (df1) and want to split it on the column "chicken" so that:
- each chicken that laid an egg becomes a distinct row
- the chickens that didn't lay an egg are aggregated in a unique row.
The output I need is df2, example:
In farm "A", there are 5 chicken, of which 2 chicken laid an egg, so there are 2 rows with egg = "True" and weight = 1 each, and 1 row with egg = "False" and weight = 3 (the 3 chicken that didn't lay an egg).
The code I came up with is messy, can you guys think of a cleaner way of doing it? Thanks!!
#code to create df1:
df1 = pd.DataFrame({'farm':["A","B","C"],"chicken":[5,10,5],"eggs":[2,3,0]})
df1=df1[["farm","chicken","eggs"]]
#code to transform df1 to df2:
df2 = pd.DataFrame()
for i in df1.index:
number_of_trues = df1.iloc[i]["eggs"]
number_of_falses = df1.iloc[i]["chicken"] - number_of_trues
col_farm = [df1.iloc[i]["farm"]]*(number_of_trues+1)
col_egg = ["True"]*number_of_trues + ["False"]*1
col_weight = [1]*number_of_trues + [number_of_falses]
mini_df = pd.DataFrame({"farm":col_farm,"egg":col_egg,"weight":col_weight})
df2=df2.append(mini_df)
df2 = df2[["farm","egg","weight"]]
df2
