1

I have a dateframe like this:

import pandas as pd
df = pd.DataFrame({'Car_ID': ['B332', 'B332', 'B332', 'C315', 'C315', 'C315', 'C315', 'C315', 'F310', 'F310'], \
                    'Date': ['2018-03-15', '2018', '2018-03-12', '2018', '2018-03-16', '2018', \
                             '2018', '2018-03-11', '2018-03-10', '2018'], \
                    'Driver': ['Alex', 'Alex', 'Alex', 'Sara', 'Sara', 'Sara', 'Sara', 'Sara', 'Franck','Franck']})
df

Out:    
    Car_ID  Date        Driver
0   B332    2018-03-15  Alex
1   B332    2018        Alex
2   B332    2018-03-12  Alex
3   C315    2018        Sara
4   C315    2018-03-16  Sara
5   C315    2018        Sara
6   C315    2018        Sara
7   C315    2018-03-11  Sara
8   F310    2018-03-10  Franck
9   F310    2018        Franck

Which contain some incorrect date? For this reason I want to create two new columns like this:

    Car_ID  Date        D_Min       D_Max       Driver
0   B332    2018-03-15  2018-03-15  2018-03-15  Alex
1   B332    2018        2018-03-12  2018-03-15  Alex
2   B332    2018-03-12  2018-03-12  2018-03-12  Alex
3   C315    2018        2018-03-16  2018        Sara
4   C315    2018-03-16  2018-03-16  2018-03-16  Sara
5   C315    2018        2018-03-11  2018-03-16  Sara
6   C315    2018        2018-03-11  2018-03-16  Sara
7   C315    2018-03-11  2018-03-11  2018-03-11  Sara
8   F310    2018-03-10  2018-03-10  2018-03-10  Franck
9   F310    2018        2018        2018-03-10  Franck

For D_Min For incorrect dates I want to take the date before which is right. If there the date before is not correct I'll take as it is, like the example 9 F310 2018 2018 2018-03-10 Franck. And I want to do the same for D_Max. But if the date is correct the D_Min and D_Max should be the same.

Thanks for your advices.

2 Answers 2

3

First replace years to NaNs by boolean mask and mask and then groupby with bfill for back filling with ffill for forward filling, last replace NaNs by fillna:

#only years are numeric
mask = df['Date'].str.isnumeric()
#alternative mask -check length of string
#mask = df['Date'].str.len() == 4
#not numeric return NaNs, so test non NaNs
#mask = pd.to_numeric(df['Date'], errors='coerce').notna()

s = df['Date'].mask(mask)

g = s.groupby(df['Driver'])
df['D_Min'] = g.bfill().fillna(df['Date'])
df['D_Max'] = g.ffill().fillna(df['Date'])

print (df)
  Car_ID        Date  Driver       D_Min       D_Max
0   B332  2018-03-15    Alex  2018-03-15  2018-03-15
1   B332        2018    Alex  2018-03-12  2018-03-15
2   B332  2018-03-12    Alex  2018-03-12  2018-03-12
3   C315        2018    Sara  2018-03-16        2018
4   C315  2018-03-16    Sara  2018-03-16  2018-03-16
5   C315        2018    Sara  2018-03-11  2018-03-16
6   C315        2018    Sara  2018-03-11  2018-03-16
7   C315  2018-03-11    Sara  2018-03-11  2018-03-11
8   F310  2018-03-10  Franck  2018-03-10  2018-03-10
9   F310        2018  Franck        2018  2018-03-10

Detail:

print (s)
0    2018-03-15
1           NaN
2    2018-03-12
3           NaN
4    2018-03-16
5           NaN
6           NaN
7    2018-03-11
8    2018-03-10
9           NaN
Name: Date, dtype: object
Sign up to request clarification or add additional context in comments.

5 Comments

Hello, how can I do the same job, but by groupby 2 columns and not one? Thanks @jezrael
@M-M - Then change s.groupby(df['Driver']) to s.groupby([df['Driver'], df['col']])
It does not work. @jezrael I got un error TypeError: unhashable type: 'list'
@M-M - Just tested in pandas 0.23.1 with sample data and g = s.groupby([df['Car_ID'], df['Driver']]) - for me it working. Maybe forget [] ?
Yes, it was [ ] problem. Thnaks
0
df_grpd = df.groupby('Car_ID').agg({'Date': [sorted, min, max]})
print df_grpd

                                              Date
                                            sorted   min         max
Car_ID
B332                [2018, 2018-03-12, 2018-03-15]  2018  2018-03-15
C315    [2018, 2018, 2018, 2018-03-11, 2018-03-16]  2018  2018-03-16
F310                            [2018, 2018-03-10]  2018  2018-03-10

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.