3

As an example, I'm trying to slice a field, 'date', containing dates in the format YYYYMMDD into 3 individual fields ('year', 'month', 'day').

I have an approach that assigns each value one at a time, but I assume there is a more efficient way to produce the desired result.

Current solution:

df['year'] = df['date'].astype(str).apply(lambda x: x[:4])
df['month'] = df['date'].astype(str).apply(lambda x: x[4:6])
df['day'] = df['date'].astype(str).apply(lambda x: x[6:8])

The following is an example of one of my attempts to simplify the code:

df['year'], df['month'], df['day'] = df['date'].astype(str).apply(lambda x: [x[:4], x[4:6], x[6:8]])
1
  • 1
    Thanks everybody! All solutions work well. I selected MaxU's answer since it can be applied to any fixed width string. Commented Jun 11, 2018 at 20:57

5 Answers 5

2

You should always convert date columns to pandas datetime object. You can do this in one of 2 ways: 1. When you impost the date:

 df = pd.read_csv('your-raw_date.csv', parse_dates=[5,7])

NOTE: 5 and 7 are the column numbers where the dates are.

  1. You can use a method called to_datetime to individual column. see this post for details.

Second, import a module called datetime

import datetime as dt

once you have those 2 things, you can do things like:

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day

you can event do other cool things like

dt['signup_date_dayofweek'] = dt['date'].dt.dayofweek

see full documentation of datatime module

Sign up to request clarification or add additional context in comments.

Comments

1

assuming column date is of a string (object) dtype:

In [18]: df
Out[18]:
       date
0  20180131
1  20180611
2  20180513

In [19]: df.dtypes
Out[19]:
date    object
dtype: object

solution:

In [22]: df[['year','month','day']] = df.date.str.extract(r'(\d{4})(\d{2})(\d{2})').astype(int)

In [23]: df
Out[23]:
       date  year  month  day
0  20180131  2018      1   31
1  20180611  2018      6   11
2  20180513  2018      5   13

In [24]: df.dtypes
Out[24]:
date     object
year      int32
month     int32
day       int32
dtype: object

PS if date is of a numeric dtype, then I'd choose @ALollz's solution...

Comments

1

convert the string to datetime and use datetime-type columns' .dt. attributes

df.date = pd.to_datetime(df.date, format='%Y%m%d')
df['year'], df['month'], df['day'] = df.date.dt.year, df.date.dt.month, df.date.dt.day

after you convert the date from date to datetime, you might find that extracting the year, month & day are not even necessary as the attributes are readily available through the date column's dt attributes.

Comments

1

Since it looks like they are already integers, forget about all of the string formatting.

df['year'] = df.date//10**4
df['month'] = df.date//100%100
df['day'] = df.date%100

Comments

1

Here's a functional solution using operator.attrgetter:

from operator import attrgetter

df = pd.DataFrame({'Date': [20180131, 20180611, 20180513]})

df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d')

parts = ('year', 'month', 'day')

df['Year'], df['Month'], df['Day'] = zip(*df['Date'].apply(attrgetter(*parts)).values)

print(df)

        Date  Year  Month  Day
0 2018-01-31  2018      1   31
1 2018-06-11  2018      6   11
2 2018-05-13  2018      5   13

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.