1

What would be the most efficient way to solve this problem?

i_have = pd.DataFrame(data={
  'id': ['A', 'B', 'C'],
  'v' : [ 's,m,l',  '1,2,3',   'k,g']
})

i_need = pd.DataFrame(data={
  'id': ['A','A','A','B','B','B','C', 'C'],
  'v' : ['s','m','l','1','2','3','k','g']
})

I though about creating a new df and while iterating over i_have append the records to the new df. But as number of rows grow, it can take a while.

0

3 Answers 3

3

Use numpy.repeat with numpy.concatenate for flattening:

#create lists by split
splitted = i_have['v'].str.split(',')
#get legths of each lists 
lens = splitted.str.len()

df = pd.DataFrame({'id':np.repeat(i_have['id'], lens),
                    'v':np.concatenate(splitted)})
print (df)
  id  v
0  A  s
0  A  m
0  A  l
1  B  1
1  B  2
1  B  3
2  C  k
2  C  g

Thank you piRSquared for solution for repeat multiple columns:

i_have = pd.DataFrame(data={
  'id': ['A', 'B', 'C'],
  'id1': ['A1', 'B1', 'C1'],
  'v' : [ 's,m,l',  '1,2,3',   'k,g']
})
print (i_have)
  id id1      v
0  A  A1  s,m,l
1  B  B1  1,2,3
2  C  C1    k,g

splitted = i_have['v'].str.split(',')
lens = splitted.str.len()

df = i_have.loc[i_have.index.repeat(lens)].assign(v=np.concatenate(splitted))
print (df)
  id id1  v
0  A  A1  s
0  A  A1  m
0  A  A1  l
1  B  B1  1
1  B  B1  2
1  B  B1  3
2  C  C1  k
2  C  C1  g
Sign up to request clarification or add additional context in comments.

5 Comments

I had the exact answer.
Follow up question: What if I have around 10 columns that are needed to be repeated?
Repeat the index and use loc. df.loc[df.index.repeat(lens)].assign(v=np.concatenate(splitted))
@JekaterinaKokatjuhha for multiple columns have a look at my answer.
@JekaterinaKokatjuhha - I add solution of piRSquared to answer.
1

If you have multiple columns then first split the data by , with expand = True(Thank you piRSquared) then stack and ffill i.e

i_have = pd.DataFrame(data={
  'id': ['A', 'B', 'C'],
  'v' : [ 's,m,l',  '1,2,3',   'k,g'],
  'w' : [ 's,8,l',  '1,2,3',   'k,g'],
  'x' : [ 's,0,l',  '1,21,3',   'ks,g'],
  'y' : [ 's,m,l',  '11,2,3',   'ks,g'],  
  'z' : [ 's,4,l',  '1,2,32',   'k,gs'],
})

i_want = i_have.apply(lambda x :x.str.split(',',expand=True).stack()).reset_index(level=1,drop=True).ffill()

If the values are not equal sized then

i_want = i_have.apply(lambda x :x.str.split(',',expand=True).stack()).reset_index(level=1,drop=True)
i_want['id'] = i_want['id'].ffill()

Output i_want

  id  v  w   x   y   z
0  A  s  s   s   s   s
1  A  m  8   0   m   4
2  A  l  l   l   l   l
3  B  1  1   1  11   1
4  B  2  2  21   2   2
5  B  3  3   3   3  32
6  C  k  k  ks  ks   k
7  C  g  g   g   g  gs

Comments

1

Here's another way

In [1667]: (i_have.set_index('id').v.str.split(',').apply(pd.Series)
                  .stack().reset_index(name='v').drop('level_1', 1))
Out[1667]:
  id  v
0  A  s
1  A  m
2  A  l
3  B  1
4  B  2
5  B  3
6  C  k
7  C  g

As pointed in comment.

In [1672]: (i_have.set_index('id').v.str.split(',', expand=True)
                  .stack().reset_index(name='v').drop('level_1', 1))
Out[1672]:
  id  V
0  A  s
1  A  m
2  A  l
3  B  1
4  B  2
5  B  3
6  C  k
7  C  g

2 Comments

You can use the expand=True argument in str.split and forgo apply(pd.Series)
@piRSquared Sir you are a legend. Expand = True is awesome.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.