1

I'm parsing my data from JSON to following DataFrame, but I'm not able to remove the extra stuff from readingtime column & convert it to datetime format

                        readingtime                      deviceId  
0  {u'$date': u'2014-11-04T17:27:50.000+0000'}           1224EG12

I tried using replace, lstring-rstring but I'm not able to replace the extra characters from thr readingtime column

da2['readingtime2'] = da2['readingtime'].str.replace('date', '') 


data['readingtime'] = data['readingtime'].map(lambda x: str(x)[13:])

Tried loc as well but not getting errors

EDITED :

I want final readingtime to be '2014-11-04 17:27:50.000 +000' which I want to convert to datetime - yyyy-mm-dd hh:mm:ss.mils +UTC

2
  • Which extra character are you talking about? can you please update the question with what is the current format you are getting? Commented Jun 20, 2015 at 6:39
  • @TessellatingHeckler, it throws following error - A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead Commented Jun 20, 2015 at 8:25

3 Answers 3

1

You can apply a lambda function to the column of the data frame, extracting the date from the dictionary via x['$date'], and then just take the date/time portion (ignoring the time offset). As this is a 'datetime naive' object, Python wouldn't know what to do with any timezone adjustment. Use this stripped date/time string (e.g. '2014-11-04T17:27:50.000') as the input to strptime.

import datetime as dt

df = pd.DataFrame({'deviceId': {0: '1224EG12', 1: '1224EG13'},
 'readingtime': {0: {u'$date': u'2014-11-04T17:27:50.000+0000'},
  1: {u'$date': u'2014-11-04T17:27:50.000+0000'}}})

>>> df
   deviceId                                  readingtime
0  1224EG12  {u'$date': u'2014-11-04T17:27:50.000+0000'}
1  1224EG13  {u'$date': u'2014-11-04T17:27:50.000+0000'}


>>> df.readingtime.apply(lambda x: dt.datetime.strptime(x['$date'][:-7], 
                                                        '%Y-%m-%dT%H:%M:%S.%f')) 
0   2014-11-04 17:27:50
1   2014-11-04 17:27:50
Name: readingtime, dtype: datetime64[ns]
Sign up to request clarification or add additional context in comments.

1 Comment

apply function throws an error = DataFrame' object has no attribute 'datetime' Checked Pandas version, its Also dt.datetime should be df.datetime (just a typo) right?
1

Assuming that da2['reading_time] returns a dict,

da2['reading_time]['$date'] 

will return you the value i.e 2014-11-04 17:27:50.000 +000

Another approach could be:

start_index = da2['reading_time'].__str__().index(':') + 3
end_index = da2['reading_time'].__str__().index('}') - 1
date = da2['reading_time'].__str__()[start_index:end_index]

2 Comments

It works for a string, but I'm not able to implement it for my dataframe, but how do you implement it for a dataframe with one of the column as readingtime ?
I'm trying following code - I'm trying following code, is there any more efficient way ? for i in range(len(da2)): da2.iloc[i,3] = da2.iloc[i,3].__str__()[start_index:end_index] Can you suggest better way?
1

try to use ast module. With ast.literal_eval() convert readingtime column into dict and then call key "$date" from the dict you've just created.

import ast

readingtime = "{u'$date': u'2014-11-04T17:27:50.000+0000'}"
da2 = ast.literal_eval(readingtime)
dat = da2['$date']

print(dat)

dat now is containing pure date string ready to be converted with datetime.

MarcinZ

2 Comments

IT works for a string but how do you use for dataframe? Type of reading column is obj?
Hi, you can convert the string into datetime object with d = datetime.datetime.strptime("2014-11-04T17:27:50.000+0000", "%Y-%m-%dT%H:%M:%S.%f%z") but timezone (%z) doesn't work properly in python2.7.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.