Normalizing rows of pandas dataframe

Question

I need to normalize the rows of a dataframe containing rows populated with all zero. For example:

df= pd.DataFrame({"ID": ['1', '2', '3', '4'], "A": [1, 0, 10, 0], "B": [4, 0, 30, 0]})

ID  A   B
1   1   4
2   0   0
3   10  30
4   0   0

My approach is to first exclude the zero-value rows followed by normalizing the non-zero subset using:

df1 = df[df.sum(axis=1) != 0]
df2 = df[df.sum(axis=1) == 0]
sum_row = df1.sum(axis=1)
df1.div(sum_row, axis=0)

and then concatenate the two dataframes as follows:

pd.concat([df1, df2]).reset_index()

However, I end up with the following error while applying df1.div(sum_row, axis=0)

ValueError: operands could not be broadcast together with shapes (6,) (2,)

I wonder how to fix the error and if there exists a more efficient approach. Thanks!

Edit: The resulting dataframe is expected to look like as:

ID  A     B
1   0.2   0.8 
2   0     0
3   0.25  0.75
4   0     0

Could you add the excepted results, please?

Anna Iliukovich-Strakovskaia
– Anna Iliukovich-Strakovskaia

2018-08-24 15:03:32 +00:00
Commented Aug 24, 2018 at 15:03 — Anna Iliukovich-Strakovskaia
– Anna Iliukovich-Strakovskaia, Commented Aug 24, 2018 at 15:03
@AnnaIliukovich-Strakovskaia Done!

user3000538
– user3000538

2018-08-24 15:09:45 +00:00
Commented Aug 24, 2018 at 15:09 — user3000538
– user3000538, Commented Aug 24, 2018 at 15:09

Vivek Kumar · Accepted Answer · 2018-08-24 15:17:30Z

7

You can use Normalizer in scikit-learn

df= pd.DataFrame({"ID": ['1', '2', '3', '4'], "A": [1, 0, 10, 0], "B": [4, 0, 30, 0]})
df = df.set_index('ID')

from sklearn.preprocessing import Normalizer
df.iloc[:,:] = Normalizer(norm='l1').fit_transform(df)

print(df)

       A     B
ID            
1   0.20  0.80
2   0.00  0.00
3   0.25  0.75
4   0.00  0.00

answered Aug 24, 2018 at 15:17

Vivek Kumar

36.8k9 gold badges116 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

demongolem Over a year ago

This is great because you can easily change the norm used. In my case, I see so many examples of the l1 norm on SO, whereas I need to normalize each row according to l2 for my current needs.

cosmic_inquiry · Accepted Answer · 2018-08-24 15:19:13Z

4

Use div:

df= pd.DataFrame({"ID": ['1', '2', '3', '4'], "A": [1, 0, 10, 0], "B": [4, 0, 30, 0]})
df.set_index("ID", inplace=True)
df.div(df.sum(axis=1), axis=0).fillna(0)

answered Aug 24, 2018 at 15:19

cosmic_inquiry

2,68415 silver badges25 bronze badges

Comments

BENY · Accepted Answer · 2018-08-24 15:18:25Z

1

Using melt with crosstab

newdf=df.melt('ID')
pd.crosstab(index=newdf.ID,columns=newdf.variable,values=newdf.value,normalize='index',aggfunc='mean')
Out[447]: 
variable     A     B
ID                  
1         0.20  0.80
2         0.00  0.00
3         0.25  0.75
4         0.00  0.00

answered Aug 24, 2018 at 15:18

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

Normalizing rows of pandas dataframe

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related