0

I have a data series 'rpt_date' :

>>> rpt_date
STK_ID
000002    [u'20060331', u'20060630']
000005    [u'20061231', u'20070331', u'20070630']
>>> type(rpt_date)
<class 'pandas.core.series.Series'>
>>> 

And how to create a multiIndex object (pandas.core.index.MultiIndex) by:

'my_index = gen_index_by_series (rpt_date)'

'my_index' looks like :

>>> my_index
MultiIndex
[('000002', '20060331') ('000002', '20060630') ('000005', '20061231')
 ('000005', '20070331') ('000005', '20070630')]
>>> type(my_index)
<class 'pandas.core.index.MultiIndex'>
>>> 

So how to write 'gen_index_by_series(series)' ?

1 Answer 1

1

To associate the first element to the other you can use itertools.repeat and zip, in this way:

>>> import itertools as it
>>> L = [['000002', [u'20060331', u'20060630']],
...      ['000005', [u'20061231', u'20070331', u'20070630']]]
>>> couples = [zip(it.repeat(key), rest) for key, rest in L]
>>> couples
[[('000002', u'20060331'), ('000002', u'20060630')],
[('000005', u'20061231'), ('000005', u'20070331'), ('000005', u'20070630')]]

It shouldn't be too hard to obtain a list like L from the Series object.

To create a MultiIndex I belive you've to use the from_tuples method:

MultiIndex.from_tuples(sum(couples, []), names=('first', 'second'))

Since I'm not a pandas user I can't help much in the remaining tasks, even though they are probably easy. It's a matter of iterating over the Series in the correct way.

Sign up to request clarification or add additional context in comments.

3 Comments

i try it. it works. Thanks. But the speed is not fast, and not 'vectorized'. Is there any other more vectorized method or any Pandas magic function ?
The original way the data is stored (as a Series of lists) is very inefficient (since getting the data out requires expensive iteration / unboxing of the values). Is there any way for you to change it?
'rpt_date' is from 'rpt_date = ori_rpt.groupby('STK_ID').RPT_Date.apply(makeup_rpt_date_list)' . I have a 'ori_rpt' dataframe contains accumulative financial report data which lost some date's report, 'makeup_rpt_date_list' is used to make up the date list accordingly. And then build a multilevel index object 'full_rpt_idx', and 'rpt = ori_rpt.reindex(index = full_rpt_idx)' to fill the missing data (although all the columns are NaN). Then I can safely use 'rpt - rpt.shift(1)' to get the quarterly data. YES, above staff is sluggish, but i can't find a more Pandas-way to improve it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.