0

I have two dataframes: One has company name and date only. Other has only time stamps. Like shown below

    creationdate
0   2012-05-01 18:20:27.167000
1   2012-05-01 19:16:08.070000
2   2012-05-01 19:20:07.880000
3   2012-05-01 19:33:02.200000
4   2012-05-01 19:35:09.173000
5   2012-05-01 20:18:55.610000
6   2012-05-01 20:26:27.577000
7   2012-05-01 20:32:34.343000
8   2012-05-01 20:39:31.257000
9   2012-05-01 21:04:50.357000
10  2012-05-01 21:54:18.983000
11  2012-05-02 02:23:53.250000
12  2012-05-02 02:40:27.643000
13  2012-05-02 08:44:28.260000

And

   sitename        date
0    Google  2012-05-01
1    Google  2012-05-02
2    Google  2012-05-03
3    Google  2012-05-04
4    Google  2012-05-05
5    Google  2012-05-06
6    Google  2012-05-07
7    Google  2012-05-08
8    Google  2012-05-09
9    Google  2012-05-10

How can I efficiently loop through the second dataframe and extract the timestamp from the first dataframe corresponding to each date in the second dataframe.

6
  • Have you tried anything yet? This looks like a really easy job for datetime. Commented Jul 1, 2014 at 22:36
  • @Cyber : I set the date column of the second df as the index and tried looping through it while checking if the index was equal to the date extracted from each element of first dataframe. But this would check all the elements of the first dataframe everytime. That's y I was asking for an efficient way Commented Jul 1, 2014 at 22:40
  • @Cyber : Can you please tell your easy way? I am new to dataframes. Commented Jul 1, 2014 at 22:50
  • "loop through the second dataframe" and "extract the timestamp from the second dataframe" and "corresponding to each date in the second dataframe" - do you need first dateframe for something ? Commented Jul 1, 2014 at 22:51
  • @furas : Actually I have to calculate the average time difference between the timestamps given in the first dataframe for a given date. And I want to do this for all the dates present in the second dataframe. For this I was trying to get the timestamps corresponding to one day and do the math Commented Jul 1, 2014 at 22:57

1 Answer 1

2

Merging (inner join) these two dataframes should work:

In [96]: df1['date'] = pd.DatetimeIndex (df1.creationdate).date

In [97]: df2['date'] = pd.DatetimeIndex (df2.date).date

In [98]: df=df1.merge(df2, on='date', how='inner')

In [99]: df
Out[99]: 
                 creationdate        date sitename
0  2012-05-01 18:20:27.167000  2012-05-01   Google
1  2012-05-01 19:16:08.070000  2012-05-01   Google
2  2012-05-01 19:20:07.880000  2012-05-01   Google
3  2012-05-01 19:33:02.200000  2012-05-01   Google
4  2012-05-01 19:35:09.173000  2012-05-01   Google
5  2012-05-01 20:18:55.610000  2012-05-01   Google
6  2012-05-01 20:26:27.577000  2012-05-01   Google
7  2012-05-01 20:32:34.343000  2012-05-01   Google
8  2012-05-01 20:39:31.257000  2012-05-01   Google
9  2012-05-01 21:04:50.357000  2012-05-01   Google
10 2012-05-01 21:54:18.983000  2012-05-01   Google
11 2012-05-02 02:23:53.250000  2012-05-02   Google
12 2012-05-02 02:40:27.643000  2012-05-02   Google
13 2012-05-02 08:44:28.260000  2012-05-02   Google

And then you can do analysis on df like

In [100]: df['time_diff'] = df.creationdate.diff()

In [101]: df.time_diff
Out[101]: 
0                NaT
1    00:55:40.903000
2    00:03:59.810000
3    00:12:54.320000
4    00:02:06.973000
5    00:43:46.437000
6    00:07:31.967000
7    00:06:06.766000
8    00:06:56.914000
9    00:25:19.100000
10   00:49:28.626000
11   04:29:34.267000
12   00:16:34.393000
13   06:04:00.617000
Name: time_diff, dtype: timedelta64[ns]

Of course, your creationdate needs to be datetime64[ns] NOT STRING. Or you need to convert pd.DatetimeIndex (df.creationdate)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.