3
# import modules, set seed
import random
import numpy as np
import pandas as pd
random.seed(42)

The problem

I am having a dataframe df. Its rows contain values which are input to a function, producing variable number of outputs. The maximum number of outputs is not known a priori. The outputs are to be put in the same row as the function, creating new columns if necessary. Unfilled cells should be filled with NaNs.


Reproducible setup

Let's create a dataframe:

df = pd.DataFrame(pd.Series([random.randint(1,10) for _ in range(5)]),columns=['randomnums'])

This looks like:

enter image description here


What have I done

Created a dataframe (auxiliarydf) with the values I want to fill the rows of the to-be created columns of the original df, using from_dict(), apply(), a lambda function, dict & list comprehension:

auxiliarydf = pd.DataFrame.from_dict(
                {index: pd.Series(array) for index, array in zip(
                         df.index,
                         df['randomnums'].apply(
                                          lambda r: 
                                          # here I apply some function on the row.
                                          # The output will be a list of variable length
                                          # for the shake of an example:
                                          np.array([x for x in range(r)])))},
                orient='index')

auxiliarydf will be:

enter image description here

concat() df with auxiliarydf:

pd.concat([df, auxiliarydf], axis=1)

Result:

enter image description here

Which is as expected.


The question

Is there an easier, maybe built-in Pandas function to do the process above? It works, but it seems like a problem which appears with enough frequency to expect a neater solution.


Colab notebook available here with the code above.

0

1 Answer 1

2

You can also try with directly creating a dataframe using the pd.DataFrame constructor and using the existing dataframe index and calling a series.tolist() to the resultant series of arrays, then you can use df.join():

auxillary_df = df['randomnums'].apply(lambda r: np.array([x for x in range(r)]))
df.join(pd.DataFrame(auxillary_df.to_list(),index=df.index))

   randomnums  0    1    2    3    4
0           2  0  1.0  NaN  NaN  NaN
1           1  0  NaN  NaN  NaN  NaN
2           5  0  1.0  2.0  3.0  4.0
3           4  0  1.0  2.0  3.0  NaN
4           4  0  1.0  2.0  3.0  NaN

Of course you can chain them to get a one liner , however readability first :)

df.join(pd.DataFrame(df['randomnums'].apply(lambda r:
    np.array([x for x in range(r)])).to_list(),index=df.index))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.