2

I have as an output of my python script a pandas dataframe as follows:

id1           id_list
1            [10,11,12]
2            [14,15,16]    
3            [17,18,19]

I would like to duplicate rows to as much as items that id_list contains, and attribute to every item in that list a rank corresponding to its position in the list.

The output I am looking for is as follows :

id1          id2           rank
1            10             1       
1            11             2   
1            12             3   
2            14             1   
2            15             2   
2            16             3   
3            17             1   
3            18             2   
3            19             3   

Thank you for your help.

2 Answers 2

2

You need to rebuild the data frame with numpy.repeat while flattening the list columns at the same time:

import numpy as np
from itertools import chain
pd.DataFrame({'id1': np.repeat(df.id1.values, df.id_list.str.len()),
              'id_list': list(chain.from_iterable(df.id_list)),
              'rank': [i for r in df.id_list for i, _ in enumerate(r, start=1)]})

# id1   id_list rank
#0  1        10    1
#0  1        11    2
#0  1        12    3
#1  2        14    1
#1  2        15    2
#1  2        16    3
#2  3        17    1
#2  3        18    2
#2  3        19    3

Or maybe slightly more efficient:

import numpy as np

(pd.DataFrame([iv for r in df.id_list for iv in enumerate(r, start=1)], 
              columns=['id_list', 'rank'])
 .assign(id1 = np.repeat(df.id1.values, df.id_list.str.len())))
Sign up to request clarification or add additional context in comments.

2 Comments

The second one works perfectly, thank you. The first one is good but it duplicates also the indexes, it doesn't increment them.
I modified the first option so that it gives unique indexes, it should work as well now.
1

Here is my solution:

In [176]: lst_col = 'id_list'

In [177]: pd.DataFrame({
     ...:     col:np.repeat(df[col].values, df[lst_col].str.len())
     ...:     for col in df.columns.difference([lst_col])
     ...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)}) \
     ...:   .assign(rank=[i+1 for l in df[lst_col].str.len() for i in range(l)])
Out[177]:
   id1  id_list  rank
0    1       10     1
1    1       11     2
2    1       12     3
3    2       14     1
4    2       15     2
5    2       16     3
6    3       17     1
7    3       18     2
8    3       19     3

PS it should work also for generic DataFrames with multiple columns

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.