Creating a Pandas DataFrame conditional on another DataFrame

Question

I have two DataFrames. The first, df1, has historical time series data for a variety of tickers with a DateTime index that looks like this:

                       ABC              DEF            XYZ
 2011-06-06            10.00            10.00          10.0000   
 2011-06-17            10.00            10.00          10.0000   
 2011-06-21            10.00            10.00          10.0000   
 2011-06-22            10.00            10.00          10.0000   
 2011-06-23            10.00            10.00          10.0000   
 2011-06-24            10.00            10.00          10.0000   
 2011-06-30            10.00            10.00          10.0000   
 2011-07-11            10.00            10.00          10.0000

The second, df2, has three columns; a Start_Date, End_Date, and Ticker. Both the Start_Date and End_Date are in datetime format:

    End_Date Start_Date  Ticker
0 2011-06-27 2011-06-22  ABC
1 2011-06-30 2011-06-17  DEF
2 2011-06-25 2011-06-18  XYZ

I want to create a third DataFrame, df3, using the following code

df4 = df.copy()
df4.lock[:] = np.nan

Between df2['Start_Date'] and df2['End_Date'] I want to populate df3 rows with 1.00 and leave the other rows as np.nan.

I've tried to create a function and also to iterate over df2.

def pos():
    position = 1
    for i in df2['Ticker']:
        df3.at[df2['Start_Date'], i] = position
    return pos

or

def pos():
    position = 1
    for index, row in df2.iterrows:
        df3.at[index, row['Start_Date']] = position
    return pos

The resulting df3 would look like this:

                        ABC              DEF              XYZ
2011-06-06              NaN              NaN              NaN   
2011-06-17              NaN              1.0              NaN   
2011-06-21              NaN              1.0              1.0   
2011-06-22              1.0              1.0              1.0   
2011-06-23              1.0              1.0              1.0   
2011-06-24              1.0              1.0              1.0   
2011-06-30              NaN              1.0              NaN   
2011-07-11              NaN              NaN              NaN   
2011-07-13              NaN              NaN              NaN   
2011-07-14              NaN              NaN              NaN

I am not having much luck with either. What is the best way to do this?

Thanks in advance

If you want to apply a function to every row (or column) of a DataFrame, you can use df2.apply(your_function) See docs: pandas.pydata.org/pandas-docs/stable/generated/… In your case, you should be able to have your function check if the date falls between start and end, and if so, take the action you want. Apply returns a new DataFrame, so just call that new one df3 — Metropolis
– Metropolis, Commented Feb 19, 2018 at 17:28
For clarity, please post what you think the final result should look like. — piRSquared
– piRSquared, Commented Feb 19, 2018 at 17:34

piRSquared · Accepted Answer · 2018-02-19 17:43:04Z

1

IIUC:

d2 = df2.set_index('Ticker')
df3 = df1.copy()
for tick, col in df3.iteritems():
    d2 = df2.set_index('Ticker')
    sd = d2.at[tick, 'Start_Date']
    ed = d2.at[tick, 'End_Date']
    df3.loc[sd:ed, tick] = 1

df3

             ABC   DEF   XYZ
2011-06-06  10.0  10.0  10.0
2011-06-17  10.0   1.0  10.0
2011-06-21  10.0   1.0   1.0
2011-06-22   1.0   1.0   1.0
2011-06-23   1.0   1.0   1.0
2011-06-24   1.0   1.0   1.0
2011-06-30  10.0   1.0  10.0
2011-07-11  10.0  10.0  10.0

answered Feb 19, 2018 at 17:43

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Creating a Pandas DataFrame conditional on another DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related