2
  • I use pandas and get my data from a SQL database
  • I have two tickers. One is a U.S stock, the other an European stock. Dates are not necessarily the same for both stocks (holidays, etc).
  • All my data is stored in a multi-index DataFrame.
  • Looking to fill missing values based on levels

Running the below code:

import pandas as pd
import datetime
ticker_date = [('US',datetime.date.today()-datetime.timedelta(3)),
('US',datetime.date.today()-datetime.timedelta(2)),
('US',datetime.date.today()-datetime.timedelta(1)),
('EU',datetime.date.today()-datetime.timedelta(3)),
('EU',datetime.date.today()-datetime.timedelta(1))]
index_df = pd.MultiIndex.from_tuples(ticker_date)
example = pd.DataFrame([12.2,12.5,12.6,15.1,15],index_df,['value'])

Output:

Output from code above

enter image description here

I am looking for a method to reshape my output filling the missing data with the previous value:

Objective: add a dec 11th line and fill with previous value

enter image description here

4
  • 1
    I'm not sure what's going on with your previous question where you posted this code as an "answer" to what is essentially the same question? Commented Dec 13, 2017 at 14:11
  • The objective was to formulate the question in a more clear and readable way Commented Dec 13, 2017 at 14:13
  • 1
    Except it leaves the other question in some limbo state. You should have edited the code into that existing question and not posted code as an answer if it doesn't answer the issue. Commented Dec 13, 2017 at 14:14
  • Didn't know I could edit the code into an existing question. New in stackoverflow. I will keep that in mind for my next question. Thxs Commented Dec 13, 2017 at 14:17

2 Answers 2

1

I'd do it this way:

In [24]: idx = pd.MultiIndex.from_product((
                   example.index.get_level_values(0).unique(), 
                   example.index.get_level_values(1).unique()))

In [25]: example = example.reindex(idx).ffill()

In [26]: example
Out[26]:
               value
US 2017-12-10   12.2
   2017-12-11   12.5
   2017-12-12   12.6
EU 2017-12-10   15.1
   2017-12-11   15.1
   2017-12-12   15.0
Sign up to request clarification or add additional context in comments.

3 Comments

Works perfectly! Thanks a lot
@ruh, glad it helps! :-)
Done, didn't know this feature!
0

You need to be careful with this solution. It does not account for crossing the primary index. For instance,

import pandas as pd
import datetime
ticker_date = [ ('US',datetime.date.today()-datetime.timedelta(3)),
                ('US',datetime.date.today()-datetime.timedelta(2)),
                ('US',datetime.date.today()-datetime.timedelta(1)),
                ('EU',datetime.date.today()-datetime.timedelta(2)),
                ('EU',datetime.date.today()-datetime.timedelta(1))]
index_df = pd.MultiIndex.from_tuples(ticker_date)
example = pd.DataFrame([12.2,12.5,12.6,15.1,15],index_df,['value'])

idx = pd.MultiIndex.from_product((
        example.index.get_level_values(0).unique(), 
        example.index.get_level_values(1).unique()))

example = example.reindex(idx).ffill()
print(example)

produces:

               value
US 2019-11-23   12.2
   2019-11-24   12.5
   2019-11-25   12.6
EU 2019-11-23   12.6  <== 
   2019-11-24   15.1
   2019-11-25   15.0

1 Comment

Welcome to Stack Overflow! Please edit your answer and reword it so it addresses the question directly. As it's written, it appears to address the other answer. This is generally considered bad practice, as those kinds of clarifications should be done through comments. But since you don't currently have enough reputation to comment, rephrasing your answer will do :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.