replicate rows in pandas by specific column with the values from that column

Question

What would be the most efficient way to solve this problem?

i_have = pd.DataFrame(data={
  'id': ['A', 'B', 'C'],
  'v' : [ 's,m,l',  '1,2,3',   'k,g']
})

i_need = pd.DataFrame(data={
  'id': ['A','A','A','B','B','B','C', 'C'],
  'v' : ['s','m','l','1','2','3','k','g']
})

I though about creating a new df and while iterating over i_have append the records to the new df. But as number of rows grow, it can take a while.

jezrael · Accepted Answer · 2017-09-04 14:09:18Z

3

Use numpy.repeat with numpy.concatenate for flattening:

#create lists by split
splitted = i_have['v'].str.split(',')
#get legths of each lists 
lens = splitted.str.len()

df = pd.DataFrame({'id':np.repeat(i_have['id'], lens),
                    'v':np.concatenate(splitted)})
print (df)
  id  v
0  A  s
0  A  m
0  A  l
1  B  1
1  B  2
1  B  3
2  C  k
2  C  g

Thank you piRSquared for solution for repeat multiple columns:

i_have = pd.DataFrame(data={
  'id': ['A', 'B', 'C'],
  'id1': ['A1', 'B1', 'C1'],
  'v' : [ 's,m,l',  '1,2,3',   'k,g']
})
print (i_have)
  id id1      v
0  A  A1  s,m,l
1  B  B1  1,2,3
2  C  C1    k,g

splitted = i_have['v'].str.split(',')
lens = splitted.str.len()

df = i_have.loc[i_have.index.repeat(lens)].assign(v=np.concatenate(splitted))
print (df)
  id id1  v
0  A  A1  s
0  A  A1  m
0  A  A1  l
1  B  B1  1
1  B  B1  2
1  B  B1  3
2  C  C1  k
2  C  C1  g

edited Sep 4, 2017 at 14:09

answered Sep 4, 2017 at 13:05

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Bharath M Shetty Over a year ago

I had the exact answer.

kkk Over a year ago

Follow up question: What if I have around 10 columns that are needed to be repeated?

piRSquared Over a year ago

Repeat the index and use loc. df.loc[df.index.repeat(lens)].assign(v=np.concatenate(splitted))

Bharath M Shetty Over a year ago

@JekaterinaKokatjuhha for multiple columns have a look at my answer.

jezrael Over a year ago

@JekaterinaKokatjuhha - I add solution of piRSquared to answer.

Bharath M Shetty · Accepted Answer · 2017-09-04 14:14:41Z

If you have multiple columns then first split the data by , with expand = True(Thank you piRSquared) then stack and ffill i.e

i_have = pd.DataFrame(data={
  'id': ['A', 'B', 'C'],
  'v' : [ 's,m,l',  '1,2,3',   'k,g'],
  'w' : [ 's,8,l',  '1,2,3',   'k,g'],
  'x' : [ 's,0,l',  '1,21,3',   'ks,g'],
  'y' : [ 's,m,l',  '11,2,3',   'ks,g'],  
  'z' : [ 's,4,l',  '1,2,32',   'k,gs'],
})

i_want = i_have.apply(lambda x :x.str.split(',',expand=True).stack()).reset_index(level=1,drop=True).ffill()

If the values are not equal sized then

i_want = i_have.apply(lambda x :x.str.split(',',expand=True).stack()).reset_index(level=1,drop=True)
i_want['id'] = i_want['id'].ffill()

Output i_want

  id  v  w   x   y   z
0  A  s  s   s   s   s
1  A  m  8   0   m   4
2  A  l  l   l   l   l
3  B  1  1   1  11   1
4  B  2  2  21   2   2
5  B  3  3   3   3  32
6  C  k  k  ks  ks   k
7  C  g  g   g   g  gs

Zero · Accepted Answer · 2017-09-04 15:24:22Z

1

Here's another way

In [1667]: (i_have.set_index('id').v.str.split(',').apply(pd.Series)
                  .stack().reset_index(name='v').drop('level_1', 1))
Out[1667]:
  id  v
0  A  s
1  A  m
2  A  l
3  B  1
4  B  2
5  B  3
6  C  k
7  C  g

As pointed in comment.

In [1672]: (i_have.set_index('id').v.str.split(',', expand=True)
                  .stack().reset_index(name='v').drop('level_1', 1))
Out[1672]:
  id  V
0  A  s
1  A  m
2  A  l
3  B  1
4  B  2
5  B  3
6  C  k
7  C  g

edited Sep 4, 2017 at 15:24

answered Sep 4, 2017 at 13:18

Zero

77.4k22 gold badges154 silver badges154 bronze badges

2 Comments

piRSquared Over a year ago

You can use the expand=True argument in str.split and forgo apply(pd.Series)

Bharath M Shetty Over a year ago

@piRSquared Sir you are a legend. Expand = True is awesome.

Collectives™ on Stack Overflow

replicate rows in pandas by specific column with the values from that column

3 Answers 3

5 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related