2

First thanks so much to those who help, it's fun learning when people can help.

I haven't got slicing and selecting down, i have a dataframe with

             Unit   Name          Count   Month Year
2013-01-01   U1     fn ln         2       01    2013
2013-01-01   U1     fn1 ln1       200     01    2013
2013-02-01   U2     fn2 ln2       55      01    2013
...
2016-01-01   U1     fn3 ln3       2       01    2016
2016-01-01   U1     fn1 ln1       200     01    2016
2016-01-01   U2     fn5 ln5       55      01    2016

I want to create various slices of this data.

First is an overall per month, next is overall per month per unit, then individual for this month, last three months, and last 6 months

code so far

# this works great groups by year per month (1 2013, 2014, 2015)...
group1=df.groupby('Month','Year')

# works great to select by unit
group2=df.groupby('Unit', 'Month', 'Year')

# now i want the top 10 individuals in each group
# this doesn't work
month_indiv = group2[['Name', 'Count']]

I think the issue is that groupby removes duplicates but i don't understand how to create the view that gives me the individuals.

1 Answer 1

1

You can convert index to periodindex by to_period and find last 3 months by unique:

print df
           Unit     Name  Count  Month  Year
2013-01-01   U1    fn ln      2      1  2013
2013-02-01   U1    fn ln      2      2  2013
2013-02-01   U1  fn1 ln1    200      2  2013
2013-03-01   U2  fn2 ln2     55      3  2013
2013-04-01   U2  fn2 ln2     55      4  2013
2013-05-01   U2  fn2 ln2     55      5  2013
2016-01-01   U1  fn3 ln3      2      1  2016
2016-01-01   U1  fn1 ln1    200      1  2016
2016-01-01   U2  fn5 ln5     55      1  2016

#convert index to Periodindex
print df.index.to_period('M')
PeriodIndex(['2013-01', '2013-02', '2013-02', '2013-03', '2013-04', '2013-05',
             '2016-01', '2016-01', '2016-01'],
            dtype='int64', freq='M')

#last 3 unique values
print df.index.to_period('M').unique()[-3:]
PeriodIndex(['2013-04', '2013-05', '2016-01'], dtype='int64', freq='M')

print df.index.to_period('M').isin(df.index.to_period('M').unique()[-3:])
[False False False False  True  True  True  True  True]

#last 3 months
print  df.loc[df.index.to_period('M').isin(df.index.to_period('M').unique()[-3:])]
           Unit     Name  Count  Month  Year
2013-04-01   U2  fn2 ln2     55      4  2013
2013-05-01   U2  fn2 ln2     55      5  2013
2016-01-01   U1  fn3 ln3      2      1  2016
2016-01-01   U1  fn1 ln1    200      1  2016
2016-01-01   U2  fn5 ln5     55      1  2016
Sign up to request clarification or add additional context in comments.

2 Comments

That gives me all of the january's across the years, i'm looking for all of the jan 2016's. Then all of from Nov, Dec 2015 and Jan 2016. Then all from the last 6 months. Thank you so much for helping.
Yes that is it. Great answer and it explains much in the way of python for me. Quick side question i'm going to user df.index.to_period lots, what's the pythonic way to create this as a variable that i use as the base and then [-x:] for the various slices. # works but gives me periodindex object month_list = df2.index.to_period('M').unique() # doesn't work as values isn't method for periodinex month_list = month_list.values()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.