Create as many columns as necessary & use them to place output of .apply() in a Pandas dataframe

Question

# import modules, set seed
import random
import numpy as np
import pandas as pd
random.seed(42)

The problem

I am having a dataframe df. Its rows contain values which are input to a function, producing variable number of outputs. The maximum number of outputs is not known a priori. The outputs are to be put in the same row as the function, creating new columns if necessary. Unfilled cells should be filled with NaNs.

Reproducible setup

Let's create a dataframe:

df = pd.DataFrame(pd.Series([random.randint(1,10) for _ in range(5)]),columns=['randomnums'])

This looks like:

What have I done

Created a dataframe (auxiliarydf) with the values I want to fill the rows of the to-be created columns of the original df, using from_dict(), apply(), a lambda function, dict & list comprehension:

auxiliarydf = pd.DataFrame.from_dict(
                {index: pd.Series(array) for index, array in zip(
                         df.index,
                         df['randomnums'].apply(
                                          lambda r: 
                                          # here I apply some function on the row.
                                          # The output will be a list of variable length
                                          # for the shake of an example:
                                          np.array([x for x in range(r)])))},
                orient='index')

auxiliarydf will be:

concat() df with auxiliarydf:

pd.concat([df, auxiliarydf], axis=1)

Result:

Which is as expected.

The question

Is there an easier, maybe built-in Pandas function to do the process above? It works, but it seems like a problem which appears with enough frequency to expect a neater solution.

Colab notebook available here with the code above.

zabop · Accepted Answer · 2020-08-21 16:43:12Z

2

You can also try with directly creating a dataframe using the pd.DataFrame constructor and using the existing dataframe index and calling a series.tolist() to the resultant series of arrays, then you can use df.join():

auxillary_df = df['randomnums'].apply(lambda r: np.array([x for x in range(r)]))
df.join(pd.DataFrame(auxillary_df.to_list(),index=df.index))

   randomnums  0    1    2    3    4
0           2  0  1.0  NaN  NaN  NaN
1           1  0  NaN  NaN  NaN  NaN
2           5  0  1.0  2.0  3.0  4.0
3           4  0  1.0  2.0  3.0  NaN
4           4  0  1.0  2.0  3.0  NaN

Of course you can chain them to get a one liner , however readability first :)

df.join(pd.DataFrame(df['randomnums'].apply(lambda r:
    np.array([x for x in range(r)])).to_list(),index=df.index))

edited Aug 21, 2020 at 16:43

zabop

8,1124 gold badges56 silver badges112 bronze badges

answered Aug 21, 2020 at 16:41

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Create as many columns as necessary & use them to place output of .apply() in a Pandas dataframe

The problem

Reproducible setup

What have I done

The question

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

The problem

Reproducible setup

What have I done

The question

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related