1

I have two DataFrames. The first, df1, has historical time series data for a variety of tickers with a DateTime index that looks like this:

                       ABC              DEF            XYZ
 2011-06-06            10.00            10.00          10.0000   
 2011-06-17            10.00            10.00          10.0000   
 2011-06-21            10.00            10.00          10.0000   
 2011-06-22            10.00            10.00          10.0000   
 2011-06-23            10.00            10.00          10.0000   
 2011-06-24            10.00            10.00          10.0000   
 2011-06-30            10.00            10.00          10.0000   
 2011-07-11            10.00            10.00          10.0000   

The second, df2, has three columns; a Start_Date, End_Date, and Ticker. Both the Start_Date and End_Date are in datetime format:

    End_Date Start_Date  Ticker
0 2011-06-27 2011-06-22  ABC
1 2011-06-30 2011-06-17  DEF
2 2011-06-25 2011-06-18  XYZ

I want to create a third DataFrame, df3, using the following code

df4 = df.copy()
df4.lock[:] = np.nan

Between df2['Start_Date'] and df2['End_Date'] I want to populate df3 rows with 1.00 and leave the other rows as np.nan.

I've tried to create a function and also to iterate over df2.

def pos():
    position = 1
    for i in df2['Ticker']:
        df3.at[df2['Start_Date'], i] = position
    return pos

or

def pos():
    position = 1
    for index, row in df2.iterrows:
        df3.at[index, row['Start_Date']] = position
    return pos

The resulting df3 would look like this:

                        ABC              DEF              XYZ
2011-06-06              NaN              NaN              NaN   
2011-06-17              NaN              1.0              NaN   
2011-06-21              NaN              1.0              1.0   
2011-06-22              1.0              1.0              1.0   
2011-06-23              1.0              1.0              1.0   
2011-06-24              1.0              1.0              1.0   
2011-06-30              NaN              1.0              NaN   
2011-07-11              NaN              NaN              NaN   
2011-07-13              NaN              NaN              NaN   
2011-07-14              NaN              NaN              NaN   

I am not having much luck with either. What is the best way to do this?

Thanks in advance

3
  • If you want to apply a function to every row (or column) of a DataFrame, you can use df2.apply(your_function) See docs: pandas.pydata.org/pandas-docs/stable/generated/… In your case, you should be able to have your function check if the date falls between start and end, and if so, take the action you want. Apply returns a new DataFrame, so just call that new one df3 Commented Feb 19, 2018 at 17:28
  • 1
    For clarity, please post what you think the final result should look like. Commented Feb 19, 2018 at 17:34
  • edited to include the final result Commented Feb 19, 2018 at 17:49

1 Answer 1

1

IIUC:

d2 = df2.set_index('Ticker')
df3 = df1.copy()
for tick, col in df3.iteritems():
    d2 = df2.set_index('Ticker')
    sd = d2.at[tick, 'Start_Date']
    ed = d2.at[tick, 'End_Date']
    df3.loc[sd:ed, tick] = 1

df3

             ABC   DEF   XYZ
2011-06-06  10.0  10.0  10.0
2011-06-17  10.0   1.0  10.0
2011-06-21  10.0   1.0   1.0
2011-06-22   1.0   1.0   1.0
2011-06-23   1.0   1.0   1.0
2011-06-24   1.0   1.0   1.0
2011-06-30  10.0   1.0  10.0
2011-07-11  10.0  10.0  10.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.