1

I have a sorted CSV dataset with four columns that I want to use as a MultiIndex, including two DateTime columns:

Alex,Beta,2011-03-01 00:00:00,2011-03-03 00:00:00,A,8,11.4
Alex,Beta,2011-03-03 00:00:00,2011-03-05 00:00:00,B,10,17.2
Alex,Beta,2011-03-05 00:00:00,2011-03-07 00:00:00,A,3,11.4
Alex,Beta,2011-03-07 00:00:00,2011-03-09 00:00:00,B,7,17.2
Alex,Orion,2011-03-02 00:00:00,2011-03-04 00:00:00,A,4,11.4
Alex,Orion,2011-03-03 00:00:00,2011-03-05 00:00:00,B,6,17.2
Alex,Orion,2011-03-04 00:00:00,2011-03-06 00:00:00,A,3,11.4
Alex,Orion,2011-03-05 00:00:00,2011-03-07 00:00:00,B,11,17.2
Alex,ZZYZX,2011-03-02 00:00:00,2011-03-05 00:00:00,A,10,11.4
Alex,ZZYZX,2011-03-04 00:00:00,2011-03-07 00:00:00,A,15,11.4
Alex,ZZYZX,2011-03-06 00:00:00,2011-03-09 00:00:00,B,20,17.2
Alex,ZZYZX,2011-03-08 00:00:00,2011-03-11 00:00:00,B,5,17.2

I can load this with read_csv and display the DataFrame hierarchically. But indexing it is another matter. The nearest I can tell is that pandas doesn't like using DateTime indexes here. If I comment out the DateTime labels in index_col as well as the corresponding entries in the indexing statement (df.loc), it works fine.

Any ideas?

#!/usr/bin/env python

import numpy as np
import pandas as pd

pd.set_option('display.height',            400)
pd.set_option('display.width',             400)
pd.set_option('display.max_rows',         1000)
pd.set_option('display.max_columns',        30)
pd.set_option('display.line_width',        200)

try:
    df = pd.read_csv(
        './sales.csv',
        header =                          None,
        na_values =                   ['NULL'],
        names = [
            'salesperson',
            'customer',
            'invoice_date',
            'ship_date',
            'product',
            'quantity',
            'price',
        ],
        index_col = [
            'salesperson',
            'customer',
            'invoice_date',
            'ship_date',
        ],
        parse_dates = [
            'invoice_date',
            'ship_date',
        ],
    )
except Exception as e:
    print(e)

try:
    print(df)
    print(df.loc[(
        'Alex',                    # salesperson
        'ZZYZX',                   # customer
        '2011-03-02 00:00:00',     # invoice_date
        '2011-03-05 00:00:00',     # ship_date
    )])
except Exception as e:
    print(e)

1 Answer 1

1

It seems to work fine, im getting a proper df. Although i would try avoiding the empty entries in every list.

If you use parse_dates you should also access those columns with a proper datetime object:

df.loc[('Alex','ZZYZX',pd.datetime(2011,3,2),pd.datetime(2011,3,5))]

product        A
quantity      10
price       11.4
Name: (Alex, ZZYZX, 2011-03-02 00:00:00, 2011-03-05 00:00:00), dtype: object
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. I hadn't seen a properly formatted datetime object and I assumed that I should use it in string format.
Sometimes passing a string 'as' a datetime works with Pandas, but here it doesnt parse it 'on the fly'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.