3

I have df

2016-06-21 06:25:09 [email protected] GET HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 application/json    2130    https://edge-chat.facebook.com/pull?channel=p_100006170407238&seq=27&clientid=1d67ca6e&profile=mobile&partition=-2&sticky_token=185&msgs_recv=27&qp=y&cb=1830997782&state=active&sticky_pool=frc3c09_chat-proxy&uid=100006170407238&viewer_uid=100006170407238&m_sess=&__dyn=1Z3p5wnE-4UpwDF3GAgy78qzoC6Erz8B0GxG9xu3Z0QwFzohxO3O2G2a1mwYxm48sxadwpVEy1qK78gwUx6&__req=79&__ajax__=AYlbtcBwGC2suZLI-J88V0PWa58vtQeG3YlQLydFRsAl6UwLSjsSpD7peu8mGl6NsHvd2zxfDcB6A0-XunBugUsYZ1lMYmUu97R43iV7XSfpyg&__user=100006170407238
2016-06-22 06:25:20 [email protected] POST HTTP/1.1   Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 application/x-javascript    20248   https://m.facebook.com/stories.php?aftercursor=MTQ2NjY2MzEwNToxNDY2NjYzMTA1Ojg6NzM0ODg0MDExMjAyNDY1MzA5NToxNDY2NjYyNzk1OjA%3D&tab=h_nor&__m_log_async__=1
2016-06-23 06:25:25 [email protected] CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443
2016-06-23 06:25:25 [email protected] GET HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 text/html   1105    https://m.facebook.com/xti.php?xt=2.qid.6299270070554694533%3Amf_story_key.343726573953754118%3Aei.AI%40ecf11fb3faf9c0b1f73ce2a74bc9f228
2016-06-24 06:25:25 [email protected] CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443
2016-06-25 06:25:25 [email protected] CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443
2016-06-25 06:25:25 [email protected] CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443

I need to get unique date to every ID (only year, month and date). Desired output:

[email protected] - 2016-06-21, 2016-06-22, 2016-06-23
[email protected] - 2016-06-24, 2016-06-25

How can I get this date?

2
  • It is not enough clear what is "df", etc. Give more details. Commented Jul 11, 2016 at 12:35
  • That is not a dataframe, what have you tried. Commented Jul 11, 2016 at 14:12

4 Answers 4

2

You can first extract the info you need from your dates:

df['filtered date'] = [w[:10] for w in df['date']]

Then you use a `drop duplicates':

output = df[['id','filtered date']].drop_duplicates()

You can then reorder your data frame for clarity:

output.sort_values(by['id','filtered date'],inplace = True)

You'll finally get this kind of output:

    id               filtered date
0   [email protected]  2016-06-24
1   [email protected]  2016-06-25
3   [email protected]  2016-06-21
4   [email protected]  2016-06-22
5   [email protected]  2016-06-23
Sign up to request clarification or add additional context in comments.

Comments

1

Pandas provides the function groupby for DataFrames, which should be suitable for what you require.

# Generate dataframe with random values
mail  = ['[email protected]', '[email protected]', '[email protected]']
stime = datetime.strptime('2016-07-01 00:00:00', '%Y-%m-%d %H:%M:%S')
etime = datetime.strptime('2016-07-30 00:00:00', '%Y-%m-%d %H:%M:%S')
tdelta = etime - stime
tdiff = tdelta.days * 24 * 60 * 60 + tdelta.seconds

df = pd.DataFrame({
    'mail': [choice(mail) for _ in range(10)],
    'time':[stime + timedelta(seconds=randrange(tdiff)) for _ in range(10)]
})

# Group dataframe by column 'mail' and apply the lambda expression to
# transform the grouped set of values into unique time values.
r = df.groupby(by='mail').apply(lambda x: set(x['time'].values))

Then, you should be able to work with the result:

print(r)

mail
[email protected]    {2016-07-24T16:42:12.000000000, 2016-07-07T15:...
[email protected]      {2016-07-13T18:53:07.000000000, 2016-07-04T06:...
[email protected]       {2016-07-10T07:37:19.000000000, 2016-07-09T07:...
dtype: object

Comments

1

Here's a one-liner (supposing date and ID as the names of the relevant columns)

df.groupby('ID').apply(lambda x: (x['date'].str[:10]).unique())

and its output

ID
[email protected]                [2016-06-24, 2016-06-25]
[email protected]    [2016-06-21, 2016-06-22, 2016-06-23]
dtype: object

Comments

1

Let's read your sample data in:

import pandas as pd
import StringIO

df = pd.read_table(StringIO.StringIO("""2016-06-21 06:25:09 [email protected] GET HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 application/json    2130    https://edge-chat.facebook.com/pull?channel=p_100006170407238&seq=27&clientid=1d67ca6e&profile=mobile&partition=-2&sticky_token=185&msgs_recv=27&qp=y&cb=1830997782&state=active&sticky_pool=frc3c09_chat-proxy&uid=100006170407238&viewer_uid=100006170407238&m_sess=&__dyn=1Z3p5wnE-4UpwDF3GAgy78qzoC6Erz8B0GxG9xu3Z0QwFzohxO3O2G2a1mwYxm48sxadwpVEy1qK78gwUx6&__req=79&__ajax__=AYlbtcBwGC2suZLI-J88V0PWa58vtQeG3YlQLydFRsAl6UwLSjsSpD7peu8mGl6NsHvd2zxfDcB6A0-XunBugUsYZ1lMYmUu97R43iV7XSfpyg&__user=100006170407238
2016-06-22 06:25:20 [email protected] POST HTTP/1.1   Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 application/x-javascript    20248   https://m.facebook.com/stories.php?aftercursor=MTQ2NjY2MzEwNToxNDY2NjYzMTA1Ojg6NzM0ODg0MDExMjAyNDY1MzA5NToxNDY2NjYyNzk1OjA%3D&tab=h_nor&__m_log_async__=1
2016-06-23 06:25:25 [email protected] CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443
2016-06-23 06:25:25 [email protected] GET HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 text/html   1105    https://m.facebook.com/xti.php?xt=2.qid.6299270070554694533%3Amf_story_key.343726573953754118%3Aei.AI%40ecf11fb3faf9c0b1f73ce2a74bc9f228
2016-06-24 06:25:25 [email protected] CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443
2016-06-25 06:25:25 [email protected] CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443
2016-06-25 06:25:25 [email protected] CONNECT HTTP/1.1    Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53   200 -   0   scontent.xx.fbcdn.net:443
"""), delim_whitespace=True, header=None)

You are interested in first (index: 0) column, which is date and third (index:2) which is email addr. Purely for visibility reasons, let's isolate them in new data frame:

df2 = df[[0, 2]]

which is now:

            0                2
0  2016-06-21  [email protected]
1  2016-06-22  [email protected]
2  2016-06-23  [email protected]
3  2016-06-23  [email protected]
4  2016-06-24  [email protected]
5  2016-06-25  [email protected]
6  2016-06-25  [email protected]

we now need to group them and aggregate with custom function which will turn aggregated dates into list (like your desired output):

df2.groupby(2).agg(lambda x: x.unique().tolist()).reset_index()

reset_index() fixes indexing so w get following data frame:

                 2                                     0
0  [email protected]              [2016-06-24, 2016-06-25]
1  [email protected]  [2016-06-21, 2016-06-22, 2016-06-23]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.