I work with pandas. I have the following data:
useradClick.head(n=5)
Out[291]:
timestamp userId adCategory adCount
0 2016-05-26 15:13:22 611 electronics 1
1 2016-05-26 15:17:24 1874 movies 1
2 2016-05-26 15:22:52 2139 computers 1
3 2016-05-26 15:22:57 212 fashion 1
4 2016-05-26 15:22:58 1027 clothing 1
I want to convert 2016-05-26 15:13:22 to 2016-05-26 15. After I want to do a group by
I tried
useradClickv1 = useradClick.select(pd.to_datetime('timestamp',format='%d%m%Y'))
But I get the error
Traceback (most recent call last):
File "<ipython-input-292-9d5a6a59d577>", line 1, in <module>
useradClickv1 = useradClick.select(pd.to_datetime('timestamp',format='%d%m%Y'))
File "/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/util/decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/tseries/tools.py", line 287, in to_datetime
unit=unit, infer_datetime_format=infer_datetime_format)
File "/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/tseries/tools.py", line 416, in _to_datetime
return _convert_listlike(np.array([arg]), box, format)[0]
File "/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/tseries/tools.py", line 402, in _convert_listlike
raise e
File "/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/tseries/tools.py", line 365, in _convert_listlike
arg, format, exact=exact, errors=errors)
File "pandas/tslib.pyx", line 3183, in pandas.tslib.array_strptime (pandas/tslib.c:55388)
**ValueError: time data 'timestamp' does not match format '%d%m%Y' (match)**
How can I do this conversion using pandas?
EDITED 2016/07/07
I checked your answer and I get the error
adclicksDF = pd.read_csv('/home/cloudera/Eglence/ad-clicks.csv')
adclicksDF = adclicksDF.rename(columns=lambda x: x.strip())
adclicksDF['adCount'] = 1
useradClick = adclicksDF[['timestamp','userId','adCategory','adCount']]
seradClick.timestamp = pd.to_datetime(useradClick.timestamp)
Traceback (most recent call last):
File "<ipython-input-31-ff9d4c4432ef>", line 1, in <module>
seradClick.timestamp = pd.to_datetime(useradClick.timestamp)
NameError: name 'seradClick' is not defined
useradClick.timestamp = pd.to_datetime(useradClick.timestamp)
/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py:2698: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self[name] = value
EDITED
I work with anaconda pandas 0.18.0
import pandas as pd
from pyspark.mllib.clustering import KMeans, KMeansModel
from numpy import array
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
import sys
conf = (SparkConf()
.setMaster("local")
.setAppName("My app")
.set("spark.executor.memory", "1g"))
sc = SparkContext(conf = conf)
sqlContext = SQLContext(sc)
adclicksDF = pd.read_csv('/home/cloudera/Eglence/ad-clicks.csv')
adclicksDF = adclicksDF.rename(columns=lambda x: x.strip())
adclicksDF['adCount'] = 1
useradClick = adclicksDF[['timestamp','userId','adCategory','adCount']]
useradClick.ix[:,'timestamp'] = p.to_datetime(useradClick.timestamp)
Traceback (most recent call last):
File "<ipython-input-21-dcc10ed41daa>", line 1, in <module>
useradClick.ix[:,'timestamp'] = p.to_datetime(useradClick.timestamp)
NameError: name 'p' is not defined
useradClick.ix[:,'timestamp'] = pd.to_datetime(useradClick.timestamp)
/home/cloudera/anaconda3/lib/python3.5/site-packages/pandas/core/indexing.py:461: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
pd.to_datetime(useradClick['timestamp'])should just work also what is the purpose of changing the format to2016-05-26 15?