Pandas timestamp conversion

Question

I work with pandas. I have the following data:

useradClick.head(n=5)
Out[291]: 
             timestamp  userId   adCategory  adCount
0  2016-05-26 15:13:22     611  electronics        1
1  2016-05-26 15:17:24    1874       movies        1
2  2016-05-26 15:22:52    2139    computers        1
3  2016-05-26 15:22:57     212      fashion        1
4  2016-05-26 15:22:58    1027     clothing        1

I want to convert 2016-05-26 15:13:22 to 2016-05-26 15. After I want to do a group by

I tried

useradClickv1 = useradClick.select(pd.to_datetime('timestamp',format='%d%m%Y'))

But I get the error

Traceback (most recent call last):

  File "<ipython-input-292-9d5a6a59d577>", line 1, in <module>
    useradClickv1 = useradClick.select(pd.to_datetime('timestamp',format='%d%m%Y'))

  File "/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/util/decorators.py", line 91, in wrapper
    return func(*args, **kwargs)

  File "/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/tseries/tools.py", line 287, in to_datetime
    unit=unit, infer_datetime_format=infer_datetime_format)

  File "/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/tseries/tools.py", line 416, in _to_datetime
    return _convert_listlike(np.array([arg]), box, format)[0]

  File "/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/tseries/tools.py", line 402, in _convert_listlike
    raise e

  File "/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/tseries/tools.py", line 365, in _convert_listlike
    arg, format, exact=exact, errors=errors)

  File "pandas/tslib.pyx", line 3183, in pandas.tslib.array_strptime (pandas/tslib.c:55388)

**ValueError: time data 'timestamp' does not match format '%d%m%Y' (match)**

How can I do this conversion using pandas?

EDITED 2016/07/07

I checked your answer and I get the error
adclicksDF = pd.read_csv('/home/cloudera/Eglence/ad-clicks.csv')

adclicksDF = adclicksDF.rename(columns=lambda x: x.strip())

adclicksDF['adCount'] = 1

useradClick = adclicksDF[['timestamp','userId','adCategory','adCount']]

seradClick.timestamp = pd.to_datetime(useradClick.timestamp)
Traceback (most recent call last):

  File "<ipython-input-31-ff9d4c4432ef>", line 1, in <module>
    seradClick.timestamp = pd.to_datetime(useradClick.timestamp)

NameError: name 'seradClick' is not defined


useradClick.timestamp = pd.to_datetime(useradClick.timestamp)
/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py:2698: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value

EDITED

I work with anaconda pandas 0.18.0

import pandas as pd

from pyspark.mllib.clustering import KMeans, KMeansModel

from numpy import array

from pyspark import SparkConf, SparkContext

from pyspark.sql import SQLContext

import sys

conf = (SparkConf()
         .setMaster("local")
         .setAppName("My app")
         .set("spark.executor.memory", "1g"))

sc          = SparkContext(conf = conf)


sqlContext  = SQLContext(sc)

adclicksDF = pd.read_csv('/home/cloudera/Eglence/ad-clicks.csv')

adclicksDF = adclicksDF.rename(columns=lambda x: x.strip())

adclicksDF['adCount'] = 1 

useradClick = adclicksDF[['timestamp','userId','adCategory','adCount']]

useradClick.ix[:,'timestamp'] = p.to_datetime(useradClick.timestamp)
Traceback (most recent call last):

  File "<ipython-input-21-dcc10ed41daa>", line 1, in <module>
    useradClick.ix[:,'timestamp'] = p.to_datetime(useradClick.timestamp)

NameError: name 'p' is not defined


useradClick.ix[:,'timestamp'] = pd.to_datetime(useradClick.timestamp)
/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/core/indexing.py:461: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s

Firstly pd.to_datetime(useradClick['timestamp']) should just work also what is the purpose of changing the format to 2016-05-26 15? — EdChum
– EdChum, Commented Jul 7, 2016 at 17:43

MaxU - stand with Ukraine · Accepted Answer · 2017-05-06 18:58:35Z

1

UPDATE:

cols = ['timestamp','userId','adCategory']
adclicksDF = pd.read_csv('/home/cloudera/Eglence/ad-clicks.csv',
                         uscols=cols,
                         parse_dates=['timestamp'],
                         skipinitialspace=True).assign(adCount=1)
#adclicksDF['adCount'] = 1

Original answer:

If i guessed correctly you don't need to convert your datetime into string as you described.

If you want to group by hour:

if your timestamp is of object (string) dtype, you should convert it to datetime first:

df.loc[: , 'timestamp'] = pd.to_datetime(df['timestamp'])

In [15]: df
Out[15]:
            timestamp  userId   adCategory  adCount
0 2016-05-26 15:13:22     611  electronics        1
1 2016-05-26 15:17:24    1874       movies        1
2 2016-05-26 15:22:52    2139    computers        1
3 2016-05-26 15:22:57     212      fashion        1
4 2016-05-26 15:22:58    1027     clothing        1
5 2016-05-26 16:22:57     111      fashion        1
6 2016-05-26 16:22:58     222     clothing        1

In [16]: df.groupby(pd.Grouper(key='timestamp', freq='1H'))['adCount'].agg(['count','sum'])
Out[16]:
                     count  sum
timestamp
2016-05-26 15:00:00      5    5
2016-05-26 16:00:00      2    2

edited May 6, 2017 at 18:58

answered Jul 7, 2016 at 18:09

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Carlota Over a year ago

I checked type fields and timestamp is <class 'str'> <class 'numpy.int64'> <class 'str'> <class 'numpy.int64'> Does groupby work with timestamp type str?

Carlota Over a year ago

I checked your answer. You can see the error I get in my first question. I edited my first question

Carlota Over a year ago

I understand the error But I don't know how to fix it. The error line is useradClick.timestamp = pd.to_datetime(useradClick.timestamp)

MaxU - stand with Ukraine Over a year ago

no, you've misspelled the DF name: seradClick instead of useradClick. BTW what is your pandas version?

MaxU - stand with Ukraine Over a year ago

you can try this df.ix[: , 'timestamp'] = pd.to_datetime(df.timestamp) in order to get rid of the warning. But pandas v. 0.18.1 should work properly also with the code from my answer - i've checked it

|

Justin Olson · Accepted Answer · 2016-07-07 19:46:26Z

0

Pandas expects the format to be in '%d%m%Y' (daymonthyear) without spaces. Your format is 2016-05-26 00:00:00 '%y-%m-%d %h:%m:%s'. Try

useradClickv1 = useradClick.select(pd.to_datetime('timestamp',format='%y-%m-%d %h:%m:%s'))

edited Jul 7, 2016 at 19:46

answered Jul 7, 2016 at 17:46

Justin Olson

1265 silver badges13 bronze badges

2 Comments

Carlota Over a year ago

I executed useradClickv1 = useradClick.select(pd.to_datetime('timestamp',format='%y-%m-%d')) But I get the error valueError: time data 'timestamp' does not match format '%y-%m-%d' (match)

Justin Olson Over a year ago

Did not see the time was part of the time stamp. I edited my answer. Try that.

Collectives™ on Stack Overflow

Pandas timestamp conversion

2 Answers 2

12 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

12 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related