0

So I have used pandas to open up my data which is in .csv format. I'd now like to reformat the dates from its current dd/mm/YYYY hh:mm:ss format to purely YYYY-mm-dd format, e.g., from 19/11/2014 15:26:13 to 2014-11-19. How would I go about doing this within a pandas data array? i.e., converting from this:

          Id  User Id           Start Time             End Time  Climb Time
0      74618    27366  19/11/2014 15:26:13  19/11/2014 15:26:18           5   
1      74632    27366  19/11/2014 15:26:18  19/11/2014 15:42:26         968   
2      74633    27366  19/11/2014 15:42:26  19/11/2014 15:42:48          22

To this:

          Id  User Id  Start Time    End Time  Climb Time
0      74618    27366  2014-11-19  2014-11-19           5   
1      74632    27366  2014-11-19  2014-11-19         968   
2      74633    27366  2014-11-19  2014-11-19          22

I've tried another of pandas ready methods, but none have seemed to work or recognise the initial data. I was wondering if anyone knew of any methods which could achieve this...

13
  • Sorry are you asking how to parse your date strings correctly or you asking about reformatting here, for the former df['Start Time'] = pd.to_datetime(df['Start Time']) will work for the latter, after conversion you can do df['Start Time'].dt.strftime('%Y-%m-%d') however, this will give you strings and not a date or datetime object which is not as useful as a proper datetime IMO Commented Nov 21, 2016 at 20:37
  • @reticentroot That example is for to.csv not from.csv. @EdChum I'd take whatever method that works - what is df? I'd like this to be explain more explicitly than just "do this - it works" - I'd like to know what is happening. Commented Nov 21, 2016 at 20:47
  • @EdChum That also gives me: TypeError: 'type' object has no attribute '__getitem__' Commented Nov 21, 2016 at 20:49
  • here df refers to a dataframe, I'm assuming you created a df using read_csv like so: df = pd.read_csv(your_file_path), you can in fact tell pandas to parse those columns: df = pd.read_csv(your_file, parse_dates=['Start Time', 'End Time']) also you didn't state clear what your current state of play is, your question asked how to do this within a pandas data array, this would imply it's already been loaded into a dataframe, if not then you should post raw data, your code and attempts to clear up any ambiguity Commented Nov 21, 2016 at 20:54
  • @EdChum Ok, this is starting to make sense. What is dt? Commented Nov 21, 2016 at 20:57

2 Answers 2

1

Let me give this a shot and see if it is what you are looking for.

Input CSV is like this (I called it test.csv for this example)

Id,User Id,Start Time,End Time,Climb Time
74618,27366,19/11/2014 15:26:13,19/11/2014 15:26:18,5   
74632,27366,19/11/2014 15:26:18,19/11/2014 15:42:26,968   
74633,27366,19/11/2014 15:42:26,19/11/2014 15:42:48,22

Now some input/output from iPython.

In [1]: import pandas as pd

In [2]: df = pd.read_csv('test.csv')

We can have a look at the conversion to date-time. Note the "type" we get at the end.

In [5]: pd.to_datetime(df['Start Time'])
Out[5]: 
0   2014-11-19 15:26:13
1   2014-11-19 15:26:18
2   2014-11-19 15:42:26
Name: Start Time, dtype: datetime64[ns]

Now you can save this back to a new column. to_datetime is pretty clever in terms of getting the dates and time parts. If it gets your dates wrong, you can specify the format for the input date-time using the format key-word

df['start_dt'] = pd.to_datetime(df['Start Time'])

Finally, you can check the dtypes for the columns in the Data Frame:

In [9]: df.dtypes
Out[9]: 
Id                     int64
User Id                int64
Start Time            object
End Time              object
Climb Time             int64
start_dt      datetime64[ns]
dtype: object

Now this new column, it is a datetime object and you can display or sort it in any way you might like to do that. From your OP, it seems you want to just get the date part. That is easy, you can do this (from this post)

In [10]: df['start_dt'].dt.date
Out[10]: 
0    2014-11-19
1    2014-11-19
2    2014-11-19
Name: start_dt, dtype: object

Now we can put this all together and do the conversion and then put it as a date in one line. It seems you want to recycle the column name, so I am doing that here, but it is not necessary. You can give the 'new' column any name you wish.

In [13]: df['Start Time'] = pd.to_datetime(df['Start Time']).dt.date

In [14]: df['End Time'] = pd.to_datetime(df['End Time']).dt.date

In [15]: df
Out[15]: 
      Id  User Id  Start Time    End Time  Climb Time    start_dt
0  74618    27366  2014-11-19  2014-11-19           5  2014-11-19
1  74632    27366  2014-11-19  2014-11-19         968  2014-11-19
2  74633    27366  2014-11-19  2014-11-19          22  2014-11-19

I think the result is what you are looking for.

Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

pd.to_datetime(df['Start Time']).dt.date

or this if you want your column type to be string:

pd.to_datetime(df.StartTime).dt.strftime('%Y-%m-%d')

2 Comments

Top method works - but passing using df = df.sort('End Time') first then df['End Time'] = pandas.to_datetime(df['End Time']).dt.date the pyplot.plot(['End Time'],['Calories']) spits out ValueError: could not convert string to float: End Time...so matplotlib does not like this format. I need to be able to sort it and use it through matplotlib.
This is a different problem, can you update your original question and add this or create a new question?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.