0

I have a DataFrame that looks something like this:

data = [
    ['item 1', 'Some text', 0.0, 1, 0.25],
    ['item 2', 'Some other text', 0.5, 0.0, 0.0],
    ['item 3', 'Etc.', 0.0, 0.25, 0.0],
]

df = pd.DataFrame(data, columns=['item_name', 'description', 'class1', 'class2', 'class3'])
print(df)

  item_name      description  class1  class2  class3
0    item 1        Some text     0.0    1.00    0.25
1    item 2  Some other text     0.5    0.00    0.00
2    item 3             Etc.     0.0    0.25    0.00

I would like to duplicate each row for each time a value greater 0 is found in columns class1 to class3, outputting item_name, description, and the class_name. Expected result is:

  item_name      description    class
0    item 1        Some text   class2
1    item 1        Some text   class3
2    item 2  Some other text   class1
3    item 3             Etc.   class2

I managed to get some output that goes into the right direction by using iterrows, however I am only able to access the class value, and not its name:

data_transf = []
for index, row in df.iterrows():
   for col in row.loc['class1':'class3']:
        if col > 0: data_transf.append(
            [row['item_name'],
             row['description'],
             col
            ])

df_new = pd.DataFrame(data_transf, columns=['item_name', 'description', 'class'])
print(df_new)

  item_name      description  class
0    item 1        Some text   1.00
1    item 1        Some text   0.25
2    item 2  Some other text   0.50
3    item 3             Etc.   0.25

The problem is that col is a float and I can't find a way to access its index position to retrieve the class name. How can this be achieved? Perhaps there is a more elegant way to do this using built-ins or coprehensions?

1

2 Answers 2

3

You can do this by transforming the data frame to long format with stack and then filter out values that are greater than 0:

# stack and filter
ldf = df.set_index(['item_name', 'description']).stack()[lambda x: x > 0]

# reset index
ldf = ldf.reset_index().drop(0, axis=1).rename(columns={'level_2': 'class'})

print(ldf)

#  item_name      description   class
#0    item 1        Some text  class2
#1    item 1        Some text  class3
#2    item 2  Some other text  class1
#3    item 3             Etc.  class2

Play

Sign up to request clarification or add additional context in comments.

1 Comment

One liner of the answer df.set_index(['item_name', 'description']).stack().to_frame('val').query("val>0").reset_index().drop(columns='val')
1

Alternative using df.melt

(df.melt(id_vars=['item_name', 'description'],var_name='class').
    query("value>0").drop(columns='value'))

  item_name      description   class
1    item 2  Some other text  class1
3    item 1        Some text  class2
5    item 3             Etc.  class2
6    item 1        Some text  class3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.