How create two columns based on existing column with conditions in Pandas?

Question

I have a dateframe like this:

import pandas as pd
df = pd.DataFrame({'Car_ID': ['B332', 'B332', 'B332', 'C315', 'C315', 'C315', 'C315', 'C315', 'F310', 'F310'], \
                    'Date': ['2018-03-15', '2018', '2018-03-12', '2018', '2018-03-16', '2018', \
                             '2018', '2018-03-11', '2018-03-10', '2018'], \
                    'Driver': ['Alex', 'Alex', 'Alex', 'Sara', 'Sara', 'Sara', 'Sara', 'Sara', 'Franck','Franck']})
df

Out:    
    Car_ID  Date        Driver
0   B332    2018-03-15  Alex
1   B332    2018        Alex
2   B332    2018-03-12  Alex
3   C315    2018        Sara
4   C315    2018-03-16  Sara
5   C315    2018        Sara
6   C315    2018        Sara
7   C315    2018-03-11  Sara
8   F310    2018-03-10  Franck
9   F310    2018        Franck

Which contain some incorrect date? For this reason I want to create two new columns like this:

    Car_ID  Date        D_Min       D_Max       Driver
0   B332    2018-03-15  2018-03-15  2018-03-15  Alex
1   B332    2018        2018-03-12  2018-03-15  Alex
2   B332    2018-03-12  2018-03-12  2018-03-12  Alex
3   C315    2018        2018-03-16  2018        Sara
4   C315    2018-03-16  2018-03-16  2018-03-16  Sara
5   C315    2018        2018-03-11  2018-03-16  Sara
6   C315    2018        2018-03-11  2018-03-16  Sara
7   C315    2018-03-11  2018-03-11  2018-03-11  Sara
8   F310    2018-03-10  2018-03-10  2018-03-10  Franck
9   F310    2018        2018        2018-03-10  Franck

For D_Min For incorrect dates I want to take the date before which is right. If there the date before is not correct I'll take as it is, like the example 9 F310 2018 2018 2018-03-10 Franck. And I want to do the same for D_Max. But if the date is correct the D_Min and D_Max should be the same.

Thanks for your advices.

jezrael · Accepted Answer · 2018-07-11 13:32:55Z

3

First replace years to NaNs by boolean mask and mask and then groupby with bfill for back filling with ffill for forward filling, last replace NaNs by fillna:

#only years are numeric
mask = df['Date'].str.isnumeric()
#alternative mask -check length of string
#mask = df['Date'].str.len() == 4
#not numeric return NaNs, so test non NaNs
#mask = pd.to_numeric(df['Date'], errors='coerce').notna()

s = df['Date'].mask(mask)

g = s.groupby(df['Driver'])
df['D_Min'] = g.bfill().fillna(df['Date'])
df['D_Max'] = g.ffill().fillna(df['Date'])

print (df)
  Car_ID        Date  Driver       D_Min       D_Max
0   B332  2018-03-15    Alex  2018-03-15  2018-03-15
1   B332        2018    Alex  2018-03-12  2018-03-15
2   B332  2018-03-12    Alex  2018-03-12  2018-03-12
3   C315        2018    Sara  2018-03-16        2018
4   C315  2018-03-16    Sara  2018-03-16  2018-03-16
5   C315        2018    Sara  2018-03-11  2018-03-16
6   C315        2018    Sara  2018-03-11  2018-03-16
7   C315  2018-03-11    Sara  2018-03-11  2018-03-11
8   F310  2018-03-10  Franck  2018-03-10  2018-03-10
9   F310        2018  Franck        2018  2018-03-10

Detail:

print (s)
0    2018-03-15
1           NaN
2    2018-03-12
3           NaN
4    2018-03-16
5           NaN
6           NaN
7    2018-03-11
8    2018-03-10
9           NaN
Name: Date, dtype: object

edited Jul 11, 2018 at 13:32

answered Jul 11, 2018 at 13:26

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

M-M Over a year ago

Hello, how can I do the same job, but by groupby 2 columns and not one? Thanks @jezrael

jezrael Over a year ago

@M-M - Then change s.groupby(df['Driver']) to s.groupby([df['Driver'], df['col']])

M-M Over a year ago

It does not work. @jezrael I got un error TypeError: unhashable type: 'list'

jezrael Over a year ago

@M-M - Just tested in pandas 0.23.1 with sample data and g = s.groupby([df['Car_ID'], df['Driver']]) - for me it working. Maybe forget [] ?

M-M Over a year ago

Yes, it was [ ] problem. Thnaks

mellifluous · Accepted Answer · 2018-07-11 13:36:37Z

0

df_grpd = df.groupby('Car_ID').agg({'Date': [sorted, min, max]})
print df_grpd

                                              Date
                                            sorted   min         max
Car_ID
B332                [2018, 2018-03-12, 2018-03-15]  2018  2018-03-15
C315    [2018, 2018, 2018, 2018-03-11, 2018-03-16]  2018  2018-03-16
F310                            [2018, 2018-03-10]  2018  2018-03-10

edited Jul 11, 2018 at 13:36

answered Jul 11, 2018 at 13:31

mellifluous

1832 silver badges8 bronze badges

Collectives™ on Stack Overflow

How create two columns based on existing column with conditions in Pandas?

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related