2

I need to select rows with greater than a count number (ie 1) of items in the mac column. Then create a DataFrame with the minimum and maximum value of timestamp.

a=np.array([['A',1],['A',2],['A',3],['B',2],['C',1],['C',2]])
df=pd.DataFrame(a,columns=['mac','timestamp'])
df
Out[103]: 
  mac timestamp
0   A         1
1   A         2
2   A         3
3   B         2
4   C         1
5   C         2

count_macs= df.groupby(['mac'])['mac'].count()>1
count_macs
Out[105]: 
mac
A     True
B    False
C     True
Name: mac, dtype: bool

I would like to get:

mac     ts1     ts2
A       1       3
C       1       2

But don't know how to apply correctly .loc :

df.loc[count_macs]
IndexingError: Unalignable boolean Series key provided
0

2 Answers 2

2

I think you need agg of max, min and size (or count if need not count NaNs). Then filter by boolean indexing, remove column and last rename columns:

df = df.groupby('mac')['timestamp'].agg(['min','max', 'size'])
d = {'min':'t1','max':'t2'}
df = df[df['size'] > 1].drop('size', 1).rename(columns=d).reset_index()
#alternatively:
#df = df.query('size > 1').drop('size', 1).rename(columns=d).reset_index()

print (df)
  mac t1 t2
0   A  1  3
1   C  1  2

Another solution is filter first with duplicated:

df = df[df['mac'].duplicated(keep=False)]
d = {'min':'t1','max':'t2'}
df = df.groupby('mac')['timestamp'].agg(['min','max']).rename(columns=d).reset_index()
print (df)
  mac t1 t2
0   A  1  3
1   C  1  2
Sign up to request clarification or add additional context in comments.

Comments

1

Having fun with lambda

f = lambda g: g.timestamp.agg(['min', 'max'])[g.size() > 1]
h = lambda x, c=iter(['ts1', 'ts2']): next(c)
f(df.groupby('mac')).rename(columns=h).reset_index()

  mac ts1 ts2
0   A   1   3
1   C   1   2

Just to be clear: we could forgo the the h and just do

f = lambda g: g.timestamp.agg(['min', 'max'])[g.size() > 1]
f(df.groupby('mac')).rename(columns=dict(min='ts1', max='ts2')).reset_index()

  mac ts1 ts2
0   A   1   3
1   C   1   2

But I like using the h (-:

3 Comments

Sir did you fall in love with lambda ? :):)
No (-: I wrote this in one line and I wanted to pass df.groupby('mac') to a lambda in order to use twice but calculate it once. While I was at it, I wanted to rename columns inline. I decided to play with the concept of passing the iterator to the lambda... and well, I ended up with the above answer.
The f is perfect. I pass a single groupby and it get's used twice. Very simple, very elegant. The h is for fun and could have just as easily been your dictionary d.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.