1

consider the below input:

[['Fiscal data as of Dec 31 2016', '2016', '2015', '2014'],
['Fiscal data as of Mar 31 2016', '2016', '2015', '2014']]   

my desired output is:

[[2016-12-31, 2015-12-31, 2014-12-31],
 [2016-03-31, 2015-03-31, 2014-12-31]]

basically, i want to convert elements 1-3 inside each nested list into datetime object with the month information based on the value of element 0 of that list.

I can think of a manually intensive solution, but I am looking for the most efficient method (speed wise) to achieve this. The actual data has many thousands such rows.

2 Answers 2

1

You can use extract for months with days, add to eache year from rigth by radd and convert to to_datetime:

L = [['Fiscal data as of Dec 31 2016', '2016', '2015', '2014'],
['Fiscal data as of Mar 31 2016', '2016', '2015', '2014']]   

a = np.array(L)
pat = '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(\d{1,2})'
d = pd.Series(a[:, 0]).str.extract(pat, expand=True).apply('-'.join, 1).add('-')
print (d)
0    Dec-31-
1    Mar-31-
dtype: object

L1 = pd.DataFrame(a[:, 1:]).radd(d, 0).apply(pd.to_datetime).values.astype('datetime64[D]')
print (L1)
[['2016-12-31' '2015-12-31' '2014-12-31']
 ['2016-03-31' '2015-03-31' '2014-03-31']]

If performance is important use dictionary for mapping months:

d = {'Jan':'01', 'Feb':'02', 'Mar':'03', 'Apr':'04', 'May':'05', 'Jun':'06', 
     'Jul':'07', 'Aug':'08', 'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}

L2 = []
for l in L:
    a = l[0].split()[-3:-1]
    a = '-'.join([d[a[0]], a[1]])
    L2.append([x + '-' + a for x in l[1:]])

print (L2)

[['2016-12-31', '2015-12-31', '2014-12-31'],
 ['2016-03-31', '2015-03-31', '2014-03-31']]

And last if need numpy array:

print (np.array(L1))
[['2016-12-31' '2015-12-31' '2014-12-31']
 ['2016-03-31' '2015-03-31' '2014-03-31']]

Timings:

L = [['Fiscal data as of Dec 31 2016', '2016', '2015', '2014'],
['Fiscal data as of Mar 31 2016', '2016', '2015', '2014']] * 10000  


In [262]: %%timeit
     ...: d = {'Jan':'01', 'Feb':'02', 'Mar':'03', 'Apr':'04', 'May':'05', 'Jun':'06', 
     ...:      'Jul':'07', 'Aug':'08', 'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}
     ...: 
     ...: L2 = []
     ...: for l in L:
     ...:     a = l[0].split()[-3:-1]
     ...:     a = '-'.join([d.get(a[0]), a[1]])
     ...:     L2.append([x + '-' + a for x in l[1:]])
     ...: 
10 loops, best of 3: 44.3 ms per loop

In [263]: %%timeit
     ...: out_list=[]
     ...: for l in L:
     ...:     l_date = datetime.strptime((" ").join(l[0].split()[-3:]), '%b %d %Y')
     ...:     out_list.append([("-").join([str(l_year),str(l_date.month),str(l_date.day)])
     ...:             for l_year in l[-3:]])
     ...: 
1 loop, best of 3: 303 ms per loop

In [264]: %%timeit
     ...: a = np.array(L)
     ...: pat = '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(\d{1,2})'
     ...: d = pd.Series(a[:, 0]).str.extract(pat, expand=True).apply('-'.join, 1).add('-')
     ...: L1 = pd.DataFrame(a[:, 1:]).radd(d, 0).apply(pd.to_datetime).values.astype('datetime64[D]')
     ...: 
1 loop, best of 3: 7.46 s per loop
Sign up to request clarification or add additional context in comments.

1 Comment

I like your numpy solution, it is also more intuitive. Thanks
0

This creates your desired output as a nested list

from datetime import datetime

in_list = [['Fiscal data as of Dec 31 2016', '2016', '2015', '2014'],
['Fiscal data as of Mar 31 2016', '2016', '2015', '2014']]

out_list=[]
for l in in_list:
    l_date = datetime.strptime((" ").join(l[0].split()[-3:]), '%b %d %Y')
    out_list.append([("-").join([str(l_year),str(l_date.month),str(l_date.day)])
            for l_year in l[-3:]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.