1

I have a dataframe like this (but much larger):

              begin        end   comp  p_n             next_d                        next_p
c_n   ml                                                                                                                                                   
1   1234 2013-09-02 2014-12-16  comp1  111 [20000, 25000, 50000]               [0.01, 0.01, 0.01]
    1235 2013-09-02 2014-12-16  comp2  222 [25000, 50000, 75000, 100000]       [0.1, 0.1, 0.1, 0.1]
2   1236 2013-09-02 2014-12-16  comp3  333 [5000, 10000, 15000, 170000, 25000] [0.1, 0.1, 0.1, 0.1, 0.1]
    1237 2013-09-02 2014-12-16  comp4  444 [5000, 10000, 25000, 50000]         [0.01, 0.01, 0.01, 0.01]

I need to expand both next_d and next_p which have the same list size for each particular row. I've tried various hints and answers, for example, this and this, but I need to expand two lists instead of one, and can't imagine how to apply this to my problem. Please help.

1
  • 1
    @yatu Oh, thanks, I've overlooked this. Commented Feb 9, 2019 at 18:53

2 Answers 2

1

Use solution for one column for both Series, concat together and last join:

s1 = pd.DataFrame(df.pop('next_d').values.tolist(), 
                   index=df.index).stack().rename('next_d').reset_index(level=2, drop=True)
s2 = pd.DataFrame(df.pop('next_p').values.tolist(), 
                   index=df.index).stack().rename('next_p').reset_index(level=2, drop=True)

df = df.join(pd.concat([s1, s2], axis=1))
print (df)
               begin         end   comp  p_n    next_d  next_p
c_n ml                                                        
1   1234  2013-09-02  2014-12-16  comp1  111   20000.0    0.01
    1234  2013-09-02  2014-12-16  comp1  111   25000.0    0.01
    1234  2013-09-02  2014-12-16  comp1  111   50000.0    0.01
    1235  2013-09-02  2014-12-16  comp2  222   25000.0    0.10
    1235  2013-09-02  2014-12-16  comp2  222   50000.0    0.10
    1235  2013-09-02  2014-12-16  comp2  222   75000.0    0.10
    1235  2013-09-02  2014-12-16  comp2  222  100000.0    0.10
2   1236  2013-09-02  2014-12-16  comp3  333    5000.0    0.10
    1236  2013-09-02  2014-12-16  comp3  333   10000.0    0.10
    1236  2013-09-02  2014-12-16  comp3  333   15000.0    0.10
    1236  2013-09-02  2014-12-16  comp3  333  170000.0    0.10
    1236  2013-09-02  2014-12-16  comp3  333   25000.0    0.10
    1237  2013-09-02  2014-12-16  comp4  444    5000.0    0.01
    1237  2013-09-02  2014-12-16  comp4  444   10000.0    0.01
    1237  2013-09-02  2014-12-16  comp4  444   25000.0    0.01
    1237  2013-09-02  2014-12-16  comp4  444   50000.0    0.01
Sign up to request clarification or add additional context in comments.

Comments

0

First define 2 functions, to be used later:

def createList(lst, lgth):
    return lst + [None] * (lgth - len(lst))

def createNames(name, lgth):
    return [ f'{name}_{i}' for i in range(1, lgth + 1) ]

Then compute max length of next_d:

maxLen = max(df.next_d.apply(len)); maxLen

Note that if the longest list in next_d is e.g. 5 (as in your case), then next_d will be replaced with just 5 new columns, the same applies to next_p.

Then compute the "extension array" (just the new columns):

df2 = df.apply(lambda row: createList(row['next_d'], maxLen) +
    createList(row['next_p'], maxLen), axis=1, result_type='expand')
df2.columns = createNames('next_d', maxLen) + createNames('next_p', maxLen)

And two last things to do are:

  • drop both original columns,
  • join the new columns.

    df = df.drop(columns=['next_d', 'next_p']).join(df2)

Now you can drop df2:

del df2

Of course, this is a horizontal expansion. After I read another answer, I'm not sure which variant you want (horizontal or vertical).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.