0

I have a time series pandas (df) table with many columns and with 2 indexes "date" and "ticker". I would like to use df.loc to select a specific range of dates , let say ("2000-01-03”: "2000-01-06”) and a specific “ticker” let say (“A”). In this way I would like to get all the info in the table related to these two criteria of all the other columns.

ex. of Data Frame

I tried the following

df.loc[("date", "2000-01-03”: "2000-01-06”),"A"]

Alternatively I wish to select all the tickers, I tired the following:

df.loc[("date", "2000-01-03”: "2000-01-06”),:]

both are not working. Any inside on how to use .loc in DataFrame with two index columns?

2 Answers 2

0

It would be great to see a sample of the dataframe, next time you submit a question, it eliminates the guess work.

Taking a look at the limited information you have provided, here are two potentials ways to solve your use case.

Approach 1 - Pandas has a great function called date_range docs. From the docs:

Returns the range of equally spaced time points (where the difference between any two adjacent points is specified by the given frequency) such that they all satisfy start <[=] x <[=] end


import pandas as pd

# Convert the 'date' column to datetime if it's not already in datetime format
df['date'] = pd.to_datetime(df['date'])

# Set 'date' and 'ticker' columns as the index
df.set_index(['date', 'ticker'], inplace=True)

# Select the desired range of dates and the specific ticker
date_range = pd.date_range('2000-01-03', '2000-01-06')
ticker = 'A'

# Use df.loc to filter based on the date range and ticker
selected_data = df.loc[(date_range, ticker), :]

Approach 2 - Slice your dataframe using a boolean You can use boolean conditions to filter the DataFrame based on the desired date range and ticker. We extract the 'date' and 'ticker' levels from the multi-index using df.index.get_level_values, and then apply the conditions.

# Set 'date' and 'ticker' columns as the index
df.set_index(['date', 'ticker'], inplace=True)

# Select the desired range of dates and the specific ticker using boolean condition
date_start = '2000-01-03'
date_end = '2000-01-06'
ticker = 'A'

selected_data = df.loc[(df.index.get_level_values('date') >= date_start) &
                       (df.index.get_level_values('date') <= date_end) &
                       (df.index.get_level_values('ticker') == ticker), :]

Approach 3 - Slicing without creating a multiindex

# Convert the 'date' column to datetime if it's not already in datetime format
df['date'] = pd.to_datetime(df['date'])

# Filter the DataFrame based on date range and ticker
date_start = '2000-01-03'
date_end = '2000-01-06'
ticker = 'A'

selected_data = df.loc[(df['date'] >= date_start) & (df['date'] <= date_end) & (df['ticker'] == ticker)]
Sign up to request clarification or add additional context in comments.

Comments

0

You can use IndexSlice this way :

select a specific range of dates let say ("2000-01-03”: "2000-01-06”) and a specific “ticker” let say (“A”).

df.loc[pd.IndexSlice["2023-01-03":"2023-01-06", "AAPL"], :]

                   Adj Close  Close   High    Low   Open     Volume
Date       Ticker                                                  
2023-01-03 AAPL       124.71 125.07 130.90 124.17 130.28  112117500
2023-01-04 AAPL       125.99 126.36 128.66 125.08 126.89   89113600
2023-01-05 AAPL       124.66 125.02 127.77 124.76 127.13   80962700
2023-01-06 AAPL       129.24 129.62 130.29 124.89 126.01   87754700

Alternatively I wish to select all the tickers,

df.loc[pd.IndexSlice["2023-01-03":"2023-01-06", :], :]

                   Adj Close  Close   High    Low   Open     Volume
Date       Ticker                                                  
2023-01-03 AAPL       124.71 125.07 130.90 124.17 130.28  112117500
           GOOG        89.70  89.70  91.55  89.02  89.83   20738500
2023-01-04 AAPL       125.99 126.36 128.66 125.08 126.89   89113600
           GOOG        88.71  88.71  91.24  87.80  91.01   27046500
2023-01-05 AAPL       124.66 125.02 127.77 124.76 127.13   80962700
           GOOG        86.77  86.77  88.21  86.56  88.07   23136100
2023-01-06 AAPL       129.24 129.62 130.29 124.89 126.01   87754700
           GOOG        88.16  88.16  88.47  85.57  87.36   26612600

Input used :

#pip install yfinance
import yfinance as yf

df = (yf.download("AAPL GOOG", start="2023-01-01", end="2023-01-31")
          .stack().rename_axis(index=["Date", "Ticker"]))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.