I have a DataFrame that looks something like this:
data = [
['item 1', 'Some text', 0.0, 1, 0.25],
['item 2', 'Some other text', 0.5, 0.0, 0.0],
['item 3', 'Etc.', 0.0, 0.25, 0.0],
]
df = pd.DataFrame(data, columns=['item_name', 'description', 'class1', 'class2', 'class3'])
print(df)
item_name description class1 class2 class3
0 item 1 Some text 0.0 1.00 0.25
1 item 2 Some other text 0.5 0.00 0.00
2 item 3 Etc. 0.0 0.25 0.00
I would like to duplicate each row for each time a value greater 0 is found in columns class1 to class3, outputting item_name, description, and the class_name. Expected result is:
item_name description class
0 item 1 Some text class2
1 item 1 Some text class3
2 item 2 Some other text class1
3 item 3 Etc. class2
I managed to get some output that goes into the right direction by using iterrows, however I am only able to access the class value, and not its name:
data_transf = []
for index, row in df.iterrows():
for col in row.loc['class1':'class3']:
if col > 0: data_transf.append(
[row['item_name'],
row['description'],
col
])
df_new = pd.DataFrame(data_transf, columns=['item_name', 'description', 'class'])
print(df_new)
item_name description class
0 item 1 Some text 1.00
1 item 1 Some text 0.25
2 item 2 Some other text 0.50
3 item 3 Etc. 0.25
The problem is that col is a float and I can't find a way to access its index position to retrieve the class name. How can this be achieved? Perhaps there is a more elegant way to do this using built-ins or coprehensions?