0

I have the following string which includes time and date along with \n with numbers. I want only date time value. Input:

str1 = '1    2016-04-30 00:30:00\n2    2016-04-30 02:00:00\n3    2016-04-30 02:00:00\n4    2016-04-30 03:16:00\n5    2016-04-30 08:27:18\n6    2016-04-30 10:10:00\n7    2016-04-30 10:27:00\n8    2016-04-30 13:00:00\n9    2016-04-30 14:00:00\n10   2016-04-30 16:00:00\n11   2016-04-30 16:30:00\n12   2016-04-30 16:30:00\n13   2016-04-30 17:18:00\n14   2016-04-30 19:00:00\n15   2016-04-30 19:30:00\n16   2016-04-30 22:00:00\n17   2016-04-30 23:12:00\n18   2016-04-30 23:30:00\n19   2016-04-30 23:50:00\n20   2016-04-30 23:50:00\n21   2016-04-30 23:50:00\nName: CrimeDate, dtype: datetime64[ns]'

output:

'2016-04-30 00:30:00,2016-04-30 02:00:00,2016-04-30 02:00:00,2016-04-30 03:16:00,2016-04-30 08:27:18,2016-04-30 10:10:00,2016-04-30 10:27:00,2016-04-30 13:00:00,2016-04-30 14:00:00,2016-04-30 16:00:00,2016-04-30 16:30:00,2016-04-30 16:30:00,2016-04-30 17:18:00,2016-04-30 19:00:00,2016-04-30 19:30:00,2016-04-30 22:00:00,2016-04-30 23:12:00,2016-04-30 23:30:00,2016-04-30 23:50:00,2016-04-30 23:50:00,2016-04-30 23:50:00'

I have tried the following ways to fix the problem:

str1 = str1.split(',')[0]
ini_string=' '.join(str1.split())[0:-16]
res = ini_string.replace(' ', ',')

but this is not working. Is there any better way to get the desired results. I am doing this in python 3.

1
  • 1
    You appear to have a pandas.DataFrame therefore you should try to use pandas operations to perform the conversion. See my suggestion for example. Commented Mar 5, 2021 at 6:00

3 Answers 3

4

I would keep it simple here and just use re.findall:

str1 = '1    2016-04-30 00:30:00\n2    2016-04-30 02:00:00\n3    2016-04-30 02:00:00\n4    2016-04-30 03:16:00\n5    2016-04-30 08:27:18\n6    2016-04-30 10:10:00\n7    2016-04-30 10:27:00\n8    2016-04-30 13:00:00\n9    2016-04-30 14:00:00\n10   2016-04-30 16:00:00\n11   2016-04-30 16:30:00\n12   2016-04-30 16:30:00\n13   2016-04-30 17:18:00\n14   2016-04-30 19:00:00\n15   2016-04-30 19:30:00\n16   2016-04-30 22:00:00\n17   2016-04-30 23:12:00\n18   2016-04-30 23:30:00\n19   2016-04-30 23:50:00\n20   2016-04-30 23:50:00\n21   2016-04-30 23:50:00\nName: CrimeDate, dtype: datetime64[ns]'
matches = re.findall(r'\b\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\b', str1)
print(matches)

This prints:

['2016-04-30 00:30:00', '2016-04-30 02:00:00', '2016-04-30 02:00:00', '2016-04-30 03:16:00',
 '2016-04-30 08:27:18', '2016-04-30 10:10:00', '2016-04-30 10:27:00', '2016-04-30 13:00:00',
 '2016-04-30 14:00:00', '2016-04-30 16:00:00', '2016-04-30 16:30:00', '2016-04-30 16:30:00',
 '2016-04-30 17:18:00', '2016-04-30 19:00:00', '2016-04-30 19:30:00', '2016-04-30 22:00:00',
 '2016-04-30 23:12:00', '2016-04-30 23:30:00', '2016-04-30 23:50:00', '2016-04-30 23:50:00',
 '2016-04-30 23:50:00']
Sign up to request clarification or add additional context in comments.

3 Comments

Good answer; I was going to use re.sub, but this is better.
@TimRoberts re.split might also be viable, but there is some junk at the end of the input which would probably interfere.
Thanks, I was trying re.split and obviously junk at the end creating some issues. re.findall is a good approach. thank you
1

Your data is a pandas DataFrame that has been converted to a string. For example you might have done this:

>>> str(df['CrimeDate'])
'0    2016-04-30 00:30:00\n1    2016-04-30 02:00:00\n2    2016-04-30 02:00:00\n3    2016-04-30 03:16:00\n4    2016-04-30 08:27:18\n5    2016-04-30 10:10:00\n6    2016-04-30 10:27:00\n7    2016-04-30 13:00:00\n8    2016-04-30 14:00:00\n9    2016-04-30 16:00:00\n10   2016-04-30 16:30:00\n11   2016-04-30 16:30:00\n12   2016-04-30 17:18:00\n13   2016-04-30 19:00:00\n14   2016-04-30 19:30:00\n15   2016-04-30 22:00:00\n16   2016-04-30 23:12:00\n17   2016-04-30 23:30:00\n18   2016-04-30 23:50:00\n19   2016-04-30 23:50:00\n20   2016-04-30 23:50:00\nName: CrimeDate, dtype: datetime64[ns]'

Assuming that you have access to the DataFrame you could convert the column to a comma separated list like this:

>>> df['CrimeDate'].to_csv(header=False, index=False, line_terminator=',')[:-1]
'2016-04-30 00:30:00,2016-04-30 02:00:00,2016-04-30 02:00:00,2016-04-30 03:16:00,2016-04-30 08:27:18,2016-04-30 10:10:00,2016-04-30 10:27:00,2016-04-30 13:00:00,2016-04-30 14:00:00,2016-04-30 16:00:00,2016-04-30 16:30:00,2016-04-30 16:30:00,2016-04-30 17:18:00,2016-04-30 19:00:00,2016-04-30 19:30:00,2016-04-30 22:00:00,2016-04-30 23:12:00,2016-04-30 23:30:00,2016-04-30 23:50:00,2016-04-30 23:50:00,2016-04-30 23:50:00'

The [:-1] removes the trailing comma added by to_csv().

Another way would be to use str.join():

>>> ','.join(str(dt) for dt in df['CrimeDate'])

but the first method avoids iteration over the DataFrame column, keeping the processing in pandas.

Comments

0
without_strip = str1.replace("\n", "")

2016-04-30 00:30:002 2016-04-30 02:00:003 2016-04-30 02:00:004
2016-04-30 03:16:005 2016-04-30 08:27:186 2016-04-30 10:10:007
2016-04-30 10:27:008 2016-04-30 13:00:009 2016-04-30 14:00:0010 2016-04-30 16:00:0011 2016-04-30 16:30:0012 2016-04-30 16:30:0013 2016-04-30 17:18:0014 2016-04-30 19:00:0015 2016-04-30 19:30:0016 2016-04-30 22:00:0017 2016-04-30 23:12:0018 2016-04-30 23:30:0019 2016-04-30 23:50:0020 2016-04-30 23:50:0021 2016-04-30 23:50:00Name: CrimeDate, dtype: datetime64[ns]

or to remove the 1 at the beginning.

listToStr = ' '.join(map(str, without_strip.split()[1:]))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.