Pandas - Extract Text from Rows

Question

Let's say I have a dataframe that looks like this:

df2 = pd.DataFrame(['Apple, 10/01/2016, 31/10/18, david/kate', 'orange', 'pear', 'Apple', '10/01/2016', '02/20/2017'], columns=['A'])

>>> df2

                                         A       file_name
0  Apple, 10/01/2016, 31/10/18, david/kate          a.txt
1                                   orange          a.txt
2                                     pear          b.txt
3                                    Apple          a.txt
4                               10/01/2016          d.txt
5                               02/20/2017          e.txt

What I would like is to just extract the dates in this dataframe, so output would be like this:

                        A        file_name
0    10/01/2016, 31/10/18           a.txt
1    Nothing to return              a.txt
2    Nothing to return              b.txt
3    Nothing to return              a.txt
4    10/01/2016                     d.txt
5    02/20/2017                     e.txt

Does anyone have any suggestions on how to do this? I am not sure where to begin.

Edit #1:

I edited my original dataframe and output results to better reflect what I am looking for.

Please do not keep on editing the question and adding your requirement after receiving the answers here — mad_
– mad_, Commented Aug 24, 2018 at 15:45

Denziloe · Accepted Answer · 2018-08-24 15:32:09Z

2

Doesn't exactly match your desired output but this structure is probably better and can be easily converted into what you want.

Basically this is a job for regex. This code should find anything of the form number/number/number:

s = df2["A"]
result = s.str.extractall(r"(\d+/\d+/\d+)")[0]
print(result)

>>>    match
    0  0        10/01/2016
       1          31/10/18
    4  0        10/01/2016
    5  0        02/20/2017

answered Aug 24, 2018 at 15:32

Denziloe

8,3523 gold badges31 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Chicken Sandwich No Pickles Over a year ago

Edited original post, sorry!

BENY · Accepted Answer · 2018-08-24 15:41:43Z

1

Using extractall add reindex(df2.index).fillna('Nothing to return')

df2.A.str.extractall(r'(((?:\d+[/-])?\d+[/-]\d+))')[0].groupby(level=0).apply(','.join)
Out[459]: 
0    10/01/2016,31/10/18
4             10/01/2016
5             02/20/2017
Name: 0, dtype: object

Update

df2.A.str.extractall(r'(((?:\d+[/-])?\d+[/-]\d+))')[0].groupby(level=0).apply(','.join).reindex(df2.index).fillna('Nothing to return')
Out[463]: 
0    10/01/2016,31/10/18
1      Nothing to return
2      Nothing to return
3      Nothing to return
4             10/01/2016
5             02/20/2017
Name: 0, dtype: object

edited Aug 24, 2018 at 15:41

answered Aug 24, 2018 at 15:31

BENY

324k22 gold badges176 silver badges250 bronze badges

13 Comments

Chicken Sandwich No Pickles Over a year ago

This works, but I made a small edit to my post, what if I wanted to keep the records that did not return anything?

Chicken Sandwich No Pickles Over a year ago

Made one last and final change. I promise I'm done editing it this time :)

BENY Over a year ago

@LunchBox no worry :-)

Chicken Sandwich No Pickles Over a year ago

Interesting, it works but I am curious as to why we did not have to escape the forward slash, "/" ?

Chicken Sandwich No Pickles Over a year ago

Although, it could have something to do with the regex editor I was using, regex101.com

|

mad_ · Accepted Answer · 2018-08-24 15:57:05Z

1

import datetime
import re
def my_func(row):
    temp=''
    for d in row.split(","):
        match=re.match('(\d*/\d*/\d*)',d.strip())
        if match:
            temp =temp + match.group(0)+','
    if(temp):
        return temp[:-1]
    return "Nothing to return"
df2.A=df2.A.apply(lambda x : my_func(x))

Output:

                        A        file_name
0    10/01/2016, 31/10/18           a.txt
1    Nothing to return              a.txt
2    Nothing to return              b.txt
3    Nothing to return              a.txt
4    10/01/2016                     d.txt
5    02/20/2017                     e.txt

edited Aug 24, 2018 at 15:57

answered Aug 24, 2018 at 15:40

mad_

8,2832 gold badges32 silver badges46 bronze badges

1 Comment

Chicken Sandwich No Pickles Over a year ago

Edited post, I promise I'm done editing it this time :)

Collectives™ on Stack Overflow

Pandas - Extract Text from Rows

3 Answers 3

1 Comment

13 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

13 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related