1

My question is simple, let's start with an hypothetical table

| a | b | c |  
| 0 | 0 | 0 |  
| 0 | 0 | 0 |  
| 0 | 1 | 1 |  
| 0 | 1 | 2 |  
| 1 | 2 | 3 |  
| 1 | 2 | 3 |  
| 1 | 3 | 4 |  

This table should be modified, the numbers in column b don't be the same existing in column a, and column c don't be the same number in column b.
So the number should be incremented continually.
The final should looks like this:

| a | b | c  |  
| 0 | 2 | 6  |  
| 0 | 2 | 6  |  
| 0 | 3 | 7  |  
| 0 | 3 | 8  |  
| 1 | 4 | 9  |  
| 1 | 4 | 9  |  
| 1 | 5 | 10 |  

Finally, has more one table. Based in the last table, since c is child of b and b child of a, this table should be converted in two columns table with 3rd column with the height of each row:

| parent | child | size |  
| 0      | 2     | 2    |  
| 0      | 3     | 2    |  
| 1      | 4     | 2    |  
| 1      | 5     | 1    |  
| 2      | 6     | 2    |  
| 3      | 7     | 1    |  
| 3      | 8     | 1    |  
| 4      | 9     | 2    |  
| 5      | 10    | 1    |  

In this 3rd table, all rows are unique and map all tree using only two columns. The 3rd column is the height of each row, that's means, how many times each pair appears in second table.

2 Answers 2

1

IIUC using factorize

import pandas as pd 
import numpy as np

s=df.stack().sort_index(level=[1])
s=pd.Series(pd.factorize(s.astype(str)+s.index.get_level_values(level=1))[0],index=s.index).unstack()
s
Out[665]:
   a  b   c
0  0  2   6
1  0  2   6
2  0  3   7
3  0  3   8
4  1  4   9
5  1  4   9
6  1  5  10


yourdf=pd.DataFrame(np.append(s[['a','b']].values,s[['b','c']].values,axis=0),columns=['p','c']).groupby(['p','c']).size().reset_index()
yourdf
Out[681]: 
   p   c  0
0  0   2  2
1  0   3  2
2  1   4  2
3  1   5  1
4  2   6  2
5  3   7  1
6  3   8  1
7  4   9  2
8  5  10  1
Sign up to request clarification or add additional context in comments.

3 Comments

Beautiful! pd.factorize(list(zip(df.columns.repeat(len(df)), df.values.ravel(order='F'))))[0].reshape(df.shape, order='F')
@piRSquared never know we can pass it to tuple , thank you sir
Really fast. Thank you. Could you explain me this part of your code? > s=pd.Series(pd.factorize(s.astype(str)+s.index.get_level_values(level=1))[0],index=s.index).unstack()
0

My go at both

n = len(df)
col = df.columns.repeat(n)
val = df.values.ravel(order='F')
f, u = pd.factorize(list(zip(col, val)))
table1 = pd.DataFrame(f.reshape(df.shape, order='F'), df.index, df.columns)

pc, pc_ = pd.factorize(list(zip(f[:-n], f[n:])))
table2 = pd.DataFrame(
    np.column_stack([pc_.tolist(), np.bincount(pc)]),
    columns=list('pcn')
)

print(table1, table2, sep='\n\n')


   a  b   c
0  0  2   6
1  0  2   6
2  0  3   7
3  0  3   8
4  1  4   9
5  1  4   9
6  1  5  10

   p   c  n
0  0   2  2
1  0   3  2
2  1   4  2
3  1   5  1
4  2   6  2
5  3   7  1
6  3   8  1
7  4   9  2
8  5  10  1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.