Creating a string from pandas column and row data

Question

I am interested in generating a string that is composed of pandas row and column data. Given the following pandas data frame I am interested only in generating a string from columns with positive values

index    A    B    C
  1      0    1    2
  2      0    0    3
  3      0    0    0
  4      1    0    0

I would like to create a new column that appends a string that lists which columns in a row were positive. Then I would drop all of the rows that the data came from:

index    Positives
  1       B-1, C-2
  2       C-3
  4       A-1

jpp · Accepted Answer · 2018-06-18 21:48:33Z

1

Here is one way using pd.DataFrame.apply + pd.Series.apply:

df = pd.DataFrame([[1, 0, 1, 2], [2, 0, 0, 3], [3, 0, 0, 0], [4, 1, 0, 0]],
                  columns=['index', 'A', 'B', 'C'])

def formatter(x):
    x = x[x > 0]
    return (x.index[1:].astype(str) + '-' + x[1:].astype(str))

df['Positives'] = df.apply(formatter, axis=1).apply(', '.join)

print(df)

   index  A  B  C  Positives
0      1  0  1  2   B-1, C-2
1      2  0  0  3        C-3
2      3  0  0  0          
3      4  1  0  0        A-1

If you need to filter out zero-length strings, you can use the fact that empty strings evaluate to False with bool:

res = df[df['Positives'].astype(bool)]

print(res)

   index  A  B  C  Positives
0      1  0  1  2   B-1, C-2
1      2  0  0  3        C-3
3      4  1  0  0        A-1

edited Jun 18, 2018 at 21:48

answered Jun 18, 2018 at 20:15

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

jpp Over a year ago

@user32185, No it doesn't. I've tested it with OP's data. Can you clarify?

rpanai Over a year ago

With this df I got error df = pd.DataFrame([[0,1,2],[0,0,3],[0,0,0],[1,0,0]], columns=["A", "B", "C"]) df = df.reset_index() df["index"] += 1

jpp Over a year ago

See my update, it works. I'm using Pandas v0.23.0, Python 3.6.5. It works also with your suggest dataframe. Looks like you have a different configuration or haven't wiped clean your session.

ALollz · Accepted Answer · 2018-06-18 19:52:11Z

0

I'd replace the zeros with np.NaN to remove things you don't care about and stack. Then form the strings you want and groupby.apply(list)

import numpy as np

df = df.set_index('index') # if 'index' is not your index.

stacked = df.replace(0, np.NaN).stack().reset_index()
stacked['Positives'] = stacked['level_1'] + '-' + stacked[0].astype(int).astype('str')
stacked = stacked.groupby('index').Positives.apply(list).reset_index()

stacked is now:

   index   Positives
0      1  [B-1, C-2]
1      2       [C-3]
2      4       [A-1]

Or if you just want one string and not a list, change the last line:

stacked.groupby('index').Positives.apply(lambda x: ', '.join(list(x))).reset_index()
#   index Positives
#0      1  B-1, C-2
#1      2       C-3
#2      4       A-1

edited Jun 18, 2018 at 19:52

answered Jun 18, 2018 at 19:45

ALollz

59.7k7 gold badges74 silver badges97 bronze badges

2 Comments

magladde Over a year ago

Hi I was just trying to work through your answer, could you explain what the purpose of 'level_1' is in your second line?

ALollz Over a year ago

When you stack the DataFrame it will create a multi-index, with the column names as one of the index levels. .reset_index() then gets rid of the multIndex, which leaves you with a column named level_1 that indicates the column name the non-zero value came from. The other column labeled 0 contains all of the non-zero values.

Collectives™ on Stack Overflow

Creating a string from pandas column and row data

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related