Nested dataframe in pandas

Question

I have a long list of status codes by month, sth like:

stats = pd.DataFrame(
    [
         ['2016-01', 200, 'xxx.com'],
         ['2016-01', 400, 'xxx.com'],
         ['2016-01', 200, 'xxx.com'],
         ['2016-02', 200, 'xxx.com']
    ],
    columns=['day', 'status_code', 'url']
)

I want to finally plot a few line charts with one line for each status code. I already found out that this table holds the correct information:

pivot = stats.pivot_table(index=['day', 'status_code'], aggfunc=len)

Looks like:

                        url
month   status_code     
2016-01 200            2
        400            1
2016-02 200            1

or as image:

So it's somewhat the information I need.

However:

1.) I already fail at accessing that information. What's e.g. the syntax for getting the number of urls with status code 200 for 2016-01?

2.) How would i plot that? I want to draw multiple lines where x-axis is the month and the y-axis is the status-code-count.

3.) Why is the outer right column named 'url' anyway? I didn't include the url in my pivot table.

1 problem per question, this is too broad. 1. pivot.loc[('2016-02',200)].sum() pass a tuple to access the multi-index and call sum. 2. you'd have to either convert the index to a datetime and access the month using .month or strip the month out and plot. 3. you called pivot_table with an aggfunc and it did this on the remaining columns so it reuses the column names not sure why this is a mystery to you — EdChum
– EdChum, Commented Mar 21, 2016 at 11:25

HYRY · Accepted Answer · 2016-03-21 11:59:57Z

5

You can use crosstab():

stats = pd.DataFrame(
    [
         ['2016-01', 200, 'xxx.com'],
         ['2016-01', 400, 'xxx.com'],
         ['2016-01', 200, 'xxx.com'],
         ['2016-02', 200, 'xxx.com']
    ],
    columns=['day', 'status_code', 'url']
)

df = pd.crosstab(stats.day, stats.status_code)

df.plot()

answered Mar 21, 2016 at 11:59

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

shredding Over a year ago

That's super awesome. It looks like that crosstab is doing essentally the same as pivot = stats.pivot_table(index='month', columns='status_code', values='url', aggfunc=len)

Collectives™ on Stack Overflow

Nested dataframe in pandas

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related