Python, dataframe : Duplicating rows according to the number of items in a list and ranking the resulted rows

Question

I have as an output of my python script a pandas dataframe as follows:

id1           id_list
1            [10,11,12]
2            [14,15,16]    
3            [17,18,19]

I would like to duplicate rows to as much as items that id_list contains, and attribute to every item in that list a rank corresponding to its position in the list.

The output I am looking for is as follows :

id1          id2           rank
1            10             1       
1            11             2   
1            12             3   
2            14             1   
2            15             2   
2            16             3   
3            17             1   
3            18             2   
3            19             3

Thank you for your help.

akuiper · Accepted Answer · 2017-03-03 15:09:24Z

2

You need to rebuild the data frame with numpy.repeat while flattening the list columns at the same time:

import numpy as np
from itertools import chain
pd.DataFrame({'id1': np.repeat(df.id1.values, df.id_list.str.len()),
              'id_list': list(chain.from_iterable(df.id_list)),
              'rank': [i for r in df.id_list for i, _ in enumerate(r, start=1)]})

# id1   id_list rank
#0  1        10    1
#0  1        11    2
#0  1        12    3
#1  2        14    1
#1  2        15    2
#1  2        16    3
#2  3        17    1
#2  3        18    2
#2  3        19    3

Or maybe slightly more efficient:

import numpy as np

(pd.DataFrame([iv for r in df.id_list for iv in enumerate(r, start=1)], 
              columns=['id_list', 'rank'])
 .assign(id1 = np.repeat(df.id1.values, df.id_list.str.len())))

edited Mar 3, 2017 at 15:09

answered Mar 3, 2017 at 14:55

akuiper

216k33 gold badges363 silver badges380 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Amy21 Over a year ago

The second one works perfectly, thank you. The first one is good but it duplicates also the indexes, it doesn't increment them.

akuiper Over a year ago

I modified the first option so that it gives unique indexes, it should work as well now.

MaxU - stand with Ukraine · Accepted Answer · 2017-03-03 15:50:59Z

1

Here is my solution:

In [176]: lst_col = 'id_list'

In [177]: pd.DataFrame({
     ...:     col:np.repeat(df[col].values, df[lst_col].str.len())
     ...:     for col in df.columns.difference([lst_col])
     ...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)}) \
     ...:   .assign(rank=[i+1 for l in df[lst_col].str.len() for i in range(l)])
Out[177]:
   id1  id_list  rank
0    1       10     1
1    1       11     2
2    1       12     3
3    2       14     1
4    2       15     2
5    2       16     3
6    3       17     1
7    3       18     2
8    3       19     3

PS it should work also for generic DataFrames with multiple columns

edited Mar 3, 2017 at 15:50

answered Mar 3, 2017 at 15:45

MaxU - stand with Ukraine

212k37 gold badges402 silver badges437 bronze badges

Collectives™ on Stack Overflow

Python, dataframe : Duplicating rows according to the number of items in a list and ranking the resulted rows

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related