Create new dataframe in pandas with dynamic names also add new column

Question

I have a dataframe df

 df = pd.DataFrame({'A':['-a',1,'a'], 
               'B':['a',np.nan,'c'],
               'ID':[1,2,2],
                't':[pd.tslib.Timestamp.now(),pd.tslib.Timestamp.now(),
                    np.nan]})

Added a new column

df['YearMonth'] = df['t'].map(lambda x: 100*x.year + x.month)

Now I want to write a function or macro which will do date comparasion, create a new dataframe also add a new column to dataframe.

I tried like this but seems I am going wrong:

def test(df,ym):
    df_new=df
    if(ym <= df['YearMonth']):
        df_new+"_"+ym=df_new
        return df_new+"_"+ym
    df_new+"_"+ym['new_col']=ym

Now when I call test function I want a new dataframe should get created named as df_new_201612 and this new dataframe should have one more column, named as new_col that has value of ym for all the rows.

test(df,201612)

The output of new dataframe is:

df_new_201612

A   B   ID  t                           YearMonth   new_col
-a  a   1   2016-12-05 12:37:56.374620  201612      201612 
1   NaN 2   2016-12-05 12:37:56.374644  201208      201612 
a   c   2   nat                         nan         201612

Your code isn't valid python - the line df_new+"new"+ym['new_col']=ym throws a SnytaxError. Also, I don't think return df_new+"_"+ym does what you think it does. — deepbrook
– deepbrook, Commented Dec 5, 2016 at 12:08
i know i am doing something wrong. Please let me know if you get some idea to implement above in pandas — user07
– user07, Commented Dec 5, 2016 at 12:44
does any one know how to deal with nan ... below solution is working if i do not have any nan value in YearMonth. How to get it done if we have nan too ? — user07
– user07, Commented Dec 5, 2016 at 16:26
df.dropna() does that for you - check the pandas docs for more — deepbrook
– deepbrook, Commented Dec 6, 2016 at 6:01

FLab · Accepted Answer · 2016-12-05 13:20:30Z

24

Creating variables with dynamic names is typically a bad practice.

I think the best solution for your problem is to store your dataframes into a dictionary and dynamically generate the name of the key to access each dataframe.

import copy

dict_of_df = {}
for ym in [201511, 201612, 201710]:

    key_name = 'df_new_'+str(ym)    

    dict_of_df[key_name] = copy.deepcopy(df)

    to_change = df['YearMonth']< ym
    dict_of_df[key_name].loc[to_change, 'new_col'] = ym   

dict_of_df.keys()
Out[36]: ['df_new_201710', 'df_new_201612', 'df_new_201511']

dict_of_df
Out[37]: 
{'df_new_201511':     A    B  ID                       t  YearMonth  new_col
 0  -a    a   1 2016-12-05 07:53:35.943     201612   201612
 1   1  NaN   2 2016-12-05 07:53:35.943     201612   201612
 2   a    c   2 2016-12-05 07:53:35.943     201612   201612,
 'df_new_201612':     A    B  ID                       t  YearMonth  new_col
 0  -a    a   1 2016-12-05 07:53:35.943     201612   201612
 1   1  NaN   2 2016-12-05 07:53:35.943     201612   201612
 2   a    c   2 2016-12-05 07:53:35.943     201612   201612,
 'df_new_201710':     A    B  ID                       t  YearMonth  new_col
 0  -a    a   1 2016-12-05 07:53:35.943     201612   201710
 1   1  NaN   2 2016-12-05 07:53:35.943     201612   201710
 2   a    c   2 2016-12-05 07:53:35.943     201612   201710}

 # Extract a single dataframe
 df_2015 = dict_of_df['df_new_201511']

edited Dec 5, 2016 at 13:20

answered Dec 5, 2016 at 12:51

FLab

7,5465 gold badges40 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user07 Over a year ago

i did not understood. My requirement is to call test function with many yearmonth values and generate seperate dataframe of that yearmonth.it would be helpful if you can explain me with example what exaclty you are trying to say

deepbrook Over a year ago

Is creating dynamically named variables even possible in python? I've tried it with anaconda3, but I get SyntaxErrors left and right?

FLab Over a year ago

Added an example to clarify

user07 Over a year ago

thanks for the example got what you were trying to say.... one more doubt how i can access df_new_201511 as a separate dataframe ? . As i will be using these dict dataframes for futhur processing

user07 Over a year ago

thanks a lot Flab... now i can resolve my problem it seems

|

Sarath Subramanian · Accepted Answer · 2022-07-07 07:19:07Z

There is a more easy way to accomplish this using exec method. The following steps can be done to create a dataframe at runtime.

1.Create the source dataframe with some random values.

import numpy as np
import pandas as pd
    
df = pd.DataFrame({'A':['-a',1,'a'], 
                   'B':['a',np.nan,'c'],
                   'ID':[1,2,2]})

2.Assign a variable that holds the new dataframe name. You can even send this value as a parameter or loop it dynamically.

new_df_name = 'df_201612'

3.Create dataframe dynamically using exec method to copy data from source dataframe to the new dataframe dynamically and in the next line assign a value to new column.

exec(f'{new_df_name} = df.copy()')
exec(f'{new_df_name}["new_col"] = 123')

4.Now the dataframe df_201612 will be available on the memory and you can execute print statement along with eval to verify this.

print(eval(new_df_name))

Collectives™ on Stack Overflow

Create new dataframe in pandas with dynamic names also add new column

2 Answers 2

9 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related