92

Lets say I have a MultiIndex Series s:

>>> s
     values
a b
1 2  0.1 
3 6  0.3
4 4  0.7

and I want to apply a function which uses the index of the row:

def f(x):
   # conditions or computations using the indexes
   if x.index[0] and ...: 
   other = sum(x.index) + ...
   return something

How can I do s.apply(f) for such a function? What is the recommended way to make this kind of operations? I expect to obtain a new Series with the values resulting from this function applied on each row and the same MultiIndex.

2
  • 4
    See this discussion, seems like x.name is what you are looking for stackoverflow.com/questions/26658240/… Commented Dec 3, 2015 at 17:13
  • @PabloJadzinsky That discussion is about DataFrame not for Series I think Commented Apr 20, 2020 at 7:33

8 Answers 8

61

I don't believe apply has access to the index; it treats each row as a numpy object, not a Series, as you can see:

In [27]: s.apply(lambda x: type(x))
Out[27]: 
a  b
1  2    <type 'numpy.float64'>
3  6    <type 'numpy.float64'>
4  4    <type 'numpy.float64'>

To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.

Series(s.reset_index().apply(f, axis=1).values, index=s.index)

Other approaches might use s.get_level_values, which often gets a little ugly in my opinion, or s.iterrows(), which is likely to be slower -- perhaps depending on exactly what f does.

Sign up to request clarification or add additional context in comments.

6 Comments

Also worth noting that vectorising f, and using & | etc., may also be faster.
Currently I use the reset_index approach, will hold a little to see if someone proposes a cleaner solution.
+1 For getting rid of the MultiIndex. While these are occasionally useful, more and more I find myself turning my indices into columns.
In my case (a dataframe, with axis=1), x.name() returns the value of the index when I apply a function lambda x: x ...
Which is totally moronic behaviour but ye, what you say is completely right, however your solution is not ideal, for most use cases Jeff's answer DataFrame(s).apply(x) is much more straightforward and should be the accepted answer IMHO!
|
20

Make it a frame, return scalars if you want (so the result is a series)

Setup

In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])

In [12]: s
Out[12]: 
a    1
b    2
c    3
dtype: float64

Printing function

In [13]: def f(x):
    print type(x), x
    return x
   ....: 

In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
Out[14]: 
   0
a  1
b  2
c  3

Since you can return anything here, just return the scalars (access the index via the name attribute)

In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]: 
a    5
b    2
c    3
dtype: float64

2 Comments

so when calling apply on DataFrame its index will be accessible through name of each series? I see this also is true for DateTimeIndex but it is a little weird to use something similar to x.name == Time(2015-06-27 20:08:32.097333+00:00)
This should be the answer, adopting x.name is the cleanest and most flexible way of addressing the problem.
15

Convert to DataFrame and apply along row. You can access the index as x.name. x is also a Series now with 1 value

s.to_frame(0).apply(f, axis=1)[0]

Comments

3

You may find it faster to use where rather than apply here:

In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])

In [12]: s.where(s.index != 'a', 5)
Out[12]: 
a    5
b    2
c    3
dtype: float64

Also you can use numpy-style logic/functions to any of the parts:

In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]: 
a   -1
b    5
c    7
dtype: float64

In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]: 
a   -1
b    5
c    7
dtype: float64

I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that applys are more readable...

5 Comments

Hm. Now I wonder if there should be a Series.eval/query method...I'll bring this up over at pandas.
@PhillipCloud, +1, I need to use indices a lot(add/subs, aligns and missing data) and this would be great to have.
I'm finding increasingly more often that if I convert my MultiIndexes to columns I'm much happier and life is easier. There's so much more you can do with columns in a DataFrame than a Series with a MultiIndex, in fact they are essentially the same thing, except queries will be faster in the DataFrame columns than in the Series-with-MultiIndex.
@PhillipCloud I'm the same, they should really be first class citizens (rather than the opposite).
This doesn't answer the question "Access index in pandas.Series.apply"
0

You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().

def f1(row):
    if row['I'] < 0.5:
        return 0
    else:
        return 1

def f2(row):
    if row['N1']==1:
        return 0
    else:
        return 1

import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)

Comments

0

Use reset_index() to convert the Series to a DataFrame and the index to a column, and then apply your function to the DataFrame.

The tricky part is knowing how reset_index() names the columns, so here are a couple of examples.

With a Singly Indexed Series

s=pd.Series({'idx1': 'val1', 'idx2': 'val2'})

def use_index_and_value(row):
    return 'I made this with index {} and value {}'.format(row['index'], row[0])

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# The new Series has an auto-index;
# You'll want to replace that with the index from the original Series
s2.index = s.index
s2

Output:

idx1    I made this with index idx1 and value val1
idx2    I made this with index idx2 and value val2
dtype: object

With a Multi-Indexed Series

Same concept here, but you'll need to access the index values as row['level_*'] because that's where they're placed by Series.reset_index().

s=pd.Series({
    ('idx(0,0)', 'idx(0,1)'): 'val1',
    ('idx(1,0)', 'idx(1,1)'): 'val2'
})

def use_index_and_value(row):
    return 'made with index: {},{} & value: {}'.format(
        row['level_0'],
        row['level_1'],
        row[0]
    )

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# Replace auto index with the index from the original Series
s2.index = s.index
s2

Output:

idx(0,0)  idx(0,1)    made with index: idx(0,0),idx(0,1) & value: val1
idx(1,0)  idx(1,1)    made with index: idx(1,0),idx(1,1) & value: val2
dtype: object

If your series or indexes have names, you will need to adjust accordingly.

Comments

0

Series implements the items() method, which enables the use of list comprehensions to map keys (i.e. index values) and values.

Given a series:

In[1]: seriesA = pd.Series([4, 2, 3, 7, 9], name="A")
In[2]: seriesA
Out[2]:
0    4
1    2
2    3
3    7
4    9
dtype: int64

Now, assume function f that takes a key and a value:

def f(key, value):
    return key + value

We can now create a new series by using a for comprehension:

In[1]: pd.Series(data=[f(k,v) for k, v in seriesA.items()], index=seriesA.index)
Out[1]:
0     4
1     3
2     5
3    10
4    13
dtype: int64

Of course this doesn't take advantage of any numpy performance goodness, but for some of operations it makes sense.

Comments

0

Another dirty solution is by using regex.

first, reset the index to create a dataframe.

df = s.reset_index()

df
a   b   values
0   1   2   0.1
1   3   6   0.3
2   4   4   0.7

Then create a column with concatenated columns and indexes as follows: just make sure to use a separator that can be easily separated during pattern recognition. In my case, I use 'first_wall' and 'second_wall'

concatenated_series = df['a'].astype(str)+'first_wall'+df['b'].astype(str)+'second_wall'+df['values'].astype(str)

concatenated_series

0    1first_wall2second_wall0.1
1    3first_wall6second_wall0.3
2    4first_wall4second_wall0.7
dtype: object

Then Create the function

def f(x):
   first_index = int(re.search('^(.+)first_wall', x).group(1))
   second_index = int(re.search('first_wall(.+)second_wall', x).group(1))
   value = float(re.search(r'second_wall(.+)$',x).group(1))
   #do something and whatever you like
   return first_index + second_index + value

Then apply it to the concatenated series.

concatenated_series.apply(f)

0    3.1
1    9.3
2    8.7
dtype: float64

Cheers!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.