1

I am interested in generating a string that is composed of pandas row and column data. Given the following pandas data frame I am interested only in generating a string from columns with positive values

index    A    B    C
  1      0    1    2
  2      0    0    3
  3      0    0    0
  4      1    0    0

I would like to create a new column that appends a string that lists which columns in a row were positive. Then I would drop all of the rows that the data came from:

index    Positives
  1       B-1, C-2
  2       C-3
  4       A-1

2 Answers 2

1

Here is one way using pd.DataFrame.apply + pd.Series.apply:

df = pd.DataFrame([[1, 0, 1, 2], [2, 0, 0, 3], [3, 0, 0, 0], [4, 1, 0, 0]],
                  columns=['index', 'A', 'B', 'C'])

def formatter(x):
    x = x[x > 0]
    return (x.index[1:].astype(str) + '-' + x[1:].astype(str))

df['Positives'] = df.apply(formatter, axis=1).apply(', '.join)

print(df)

   index  A  B  C  Positives
0      1  0  1  2   B-1, C-2
1      2  0  0  3        C-3
2      3  0  0  0          
3      4  1  0  0        A-1

If you need to filter out zero-length strings, you can use the fact that empty strings evaluate to False with bool:

res = df[df['Positives'].astype(bool)]

print(res)

   index  A  B  C  Positives
0      1  0  1  2   B-1, C-2
1      2  0  0  3        C-3
3      4  1  0  0        A-1
Sign up to request clarification or add additional context in comments.

3 Comments

@user32185, No it doesn't. I've tested it with OP's data. Can you clarify?
With this df I got error df = pd.DataFrame([[0,1,2],[0,0,3],[0,0,0],[1,0,0]], columns=["A", "B", "C"]) df = df.reset_index() df["index"] += 1
See my update, it works. I'm using Pandas v0.23.0, Python 3.6.5. It works also with your suggest dataframe. Looks like you have a different configuration or haven't wiped clean your session.
0

I'd replace the zeros with np.NaN to remove things you don't care about and stack. Then form the strings you want and groupby.apply(list)

import numpy as np

df = df.set_index('index') # if 'index' is not your index.

stacked = df.replace(0, np.NaN).stack().reset_index()
stacked['Positives'] = stacked['level_1'] + '-' + stacked[0].astype(int).astype('str')
stacked = stacked.groupby('index').Positives.apply(list).reset_index()

stacked is now:

   index   Positives
0      1  [B-1, C-2]
1      2       [C-3]
2      4       [A-1]

Or if you just want one string and not a list, change the last line:

stacked.groupby('index').Positives.apply(lambda x: ', '.join(list(x))).reset_index()
#   index Positives
#0      1  B-1, C-2
#1      2       C-3
#2      4       A-1

2 Comments

Hi I was just trying to work through your answer, could you explain what the purpose of 'level_1' is in your second line?
When you stack the DataFrame it will create a multi-index, with the column names as one of the index levels. .reset_index() then gets rid of the multIndex, which leaves you with a column named level_1 that indicates the column name the non-zero value came from. The other column labeled 0 contains all of the non-zero values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.