1

Let's say I have a dataframe that looks like this:

df2 = pd.DataFrame(['Apple, 10/01/2016, 31/10/18, david/kate', 'orange', 'pear', 'Apple', '10/01/2016', '02/20/2017'], columns=['A'])

>>> df2

                                         A       file_name
0  Apple, 10/01/2016, 31/10/18, david/kate          a.txt
1                                   orange          a.txt
2                                     pear          b.txt
3                                    Apple          a.txt
4                               10/01/2016          d.txt
5                               02/20/2017          e.txt

What I would like is to just extract the dates in this dataframe, so output would be like this:

                        A        file_name
0    10/01/2016, 31/10/18           a.txt
1    Nothing to return              a.txt
2    Nothing to return              b.txt
3    Nothing to return              a.txt
4    10/01/2016                     d.txt
5    02/20/2017                     e.txt

Does anyone have any suggestions on how to do this? I am not sure where to begin.

Edit #1:

I edited my original dataframe and output results to better reflect what I am looking for.

2
  • Please do not keep on editing the question and adding your requirement after receiving the answers here Commented Aug 24, 2018 at 15:45
  • I'm done editing, sorry, but I appreciate the help! Commented Aug 24, 2018 at 15:46

3 Answers 3

2

Doesn't exactly match your desired output but this structure is probably better and can be easily converted into what you want.

Basically this is a job for regex. This code should find anything of the form number/number/number:

s = df2["A"]
result = s.str.extractall(r"(\d+/\d+/\d+)")[0]
print(result)

>>>    match
    0  0        10/01/2016
       1          31/10/18
    4  0        10/01/2016
    5  0        02/20/2017
Sign up to request clarification or add additional context in comments.

1 Comment

Edited original post, sorry!
1

Using extractall add reindex(df2.index).fillna('Nothing to return')

df2.A.str.extractall(r'(((?:\d+[/-])?\d+[/-]\d+))')[0].groupby(level=0).apply(','.join)
Out[459]: 
0    10/01/2016,31/10/18
4             10/01/2016
5             02/20/2017
Name: 0, dtype: object

Update

df2.A.str.extractall(r'(((?:\d+[/-])?\d+[/-]\d+))')[0].groupby(level=0).apply(','.join).reindex(df2.index).fillna('Nothing to return')
Out[463]: 
0    10/01/2016,31/10/18
1      Nothing to return
2      Nothing to return
3      Nothing to return
4             10/01/2016
5             02/20/2017
Name: 0, dtype: object

13 Comments

This works, but I made a small edit to my post, what if I wanted to keep the records that did not return anything?
Made one last and final change. I promise I'm done editing it this time :)
@LunchBox no worry :-)
Interesting, it works but I am curious as to why we did not have to escape the forward slash, "/" ?
Although, it could have something to do with the regex editor I was using, regex101.com
|
1
import datetime
import re
def my_func(row):
    temp=''
    for d in row.split(","):
        match=re.match('(\d*/\d*/\d*)',d.strip())
        if match:
            temp =temp + match.group(0)+','
    if(temp):
        return temp[:-1]
    return "Nothing to return"
df2.A=df2.A.apply(lambda x : my_func(x))

Output:

                        A        file_name
0    10/01/2016, 31/10/18           a.txt
1    Nothing to return              a.txt
2    Nothing to return              b.txt
3    Nothing to return              a.txt
4    10/01/2016                     d.txt
5    02/20/2017                     e.txt

1 Comment

Edited post, I promise I'm done editing it this time :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.