Python + pandas replace column values based in other column with

Question

My question is simple, let's start with an hypothetical table

| a | b | c |  
| 0 | 0 | 0 |  
| 0 | 0 | 0 |  
| 0 | 1 | 1 |  
| 0 | 1 | 2 |  
| 1 | 2 | 3 |  
| 1 | 2 | 3 |  
| 1 | 3 | 4 |

This table should be modified, the numbers in column b don't be the same existing in column a, and column c don't be the same number in column b.
So the number should be incremented continually.
The final should looks like this:

| a | b | c  |  
| 0 | 2 | 6  |  
| 0 | 2 | 6  |  
| 0 | 3 | 7  |  
| 0 | 3 | 8  |  
| 1 | 4 | 9  |  
| 1 | 4 | 9  |  
| 1 | 5 | 10 |

Finally, has more one table. Based in the last table, since c is child of b and b child of a, this table should be converted in two columns table with 3rd column with the height of each row:

| parent | child | size |  
| 0      | 2     | 2    |  
| 0      | 3     | 2    |  
| 1      | 4     | 2    |  
| 1      | 5     | 1    |  
| 2      | 6     | 2    |  
| 3      | 7     | 1    |  
| 3      | 8     | 1    |  
| 4      | 9     | 2    |  
| 5      | 10    | 1    |

In this 3rd table, all rows are unique and map all tree using only two columns. The 3rd column is the height of each row, that's means, how many times each pair appears in second table.

BENY · Accepted Answer · 2018-03-30 22:23:56Z

1

IIUC using factorize

import pandas as pd 
import numpy as np

s=df.stack().sort_index(level=[1])
s=pd.Series(pd.factorize(s.astype(str)+s.index.get_level_values(level=1))[0],index=s.index).unstack()
s
Out[665]:
   a  b   c
0  0  2   6
1  0  2   6
2  0  3   7
3  0  3   8
4  1  4   9
5  1  4   9
6  1  5  10


yourdf=pd.DataFrame(np.append(s[['a','b']].values,s[['b','c']].values,axis=0),columns=['p','c']).groupby(['p','c']).size().reset_index()
yourdf
Out[681]: 
   p   c  0
0  0   2  2
1  0   3  2
2  1   4  2
3  1   5  1
4  2   6  2
5  3   7  1
6  3   8  1
7  4   9  2
8  5  10  1

answered Mar 30, 2018 at 22:23

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

piRSquared Over a year ago

Beautiful! pd.factorize(list(zip(df.columns.repeat(len(df)), df.values.ravel(order='F'))))[0].reshape(df.shape, order='F')

BENY Over a year ago

@piRSquared never know we can pass it to tuple , thank you sir

Aureliano Guedes Over a year ago

Really fast. Thank you. Could you explain me this part of your code? > s=pd.Series(pd.factorize(s.astype(str)+s.index.get_level_values(level=1))[0],index=s.index).unstack()

piRSquared · Accepted Answer · 2018-03-30 23:36:26Z

0

My go at both

n = len(df)
col = df.columns.repeat(n)
val = df.values.ravel(order='F')
f, u = pd.factorize(list(zip(col, val)))
table1 = pd.DataFrame(f.reshape(df.shape, order='F'), df.index, df.columns)

pc, pc_ = pd.factorize(list(zip(f[:-n], f[n:])))
table2 = pd.DataFrame(
    np.column_stack([pc_.tolist(), np.bincount(pc)]),
    columns=list('pcn')
)

print(table1, table2, sep='\n\n')


   a  b   c
0  0  2   6
1  0  2   6
2  0  3   7
3  0  3   8
4  1  4   9
5  1  4   9
6  1  5  10

   p   c  n
0  0   2  2
1  0   3  2
2  1   4  2
3  1   5  1
4  2   6  2
5  3   7  1
6  3   8  1
7  4   9  2
8  5  10  1

answered Mar 30, 2018 at 23:36

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Collectives™ on Stack Overflow

Python + pandas replace column values based in other column with

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related