Access index in pandas.Series.apply

Question

Lets say I have a MultiIndex Series s:

>>> s
     values
a b
1 2  0.1 
3 6  0.3
4 4  0.7

and I want to apply a function which uses the index of the row:

def f(x):
   # conditions or computations using the indexes
   if x.index[0] and ...: 
   other = sum(x.index) + ...
   return something

How can I do s.apply(f) for such a function? What is the recommended way to make this kind of operations? I expect to obtain a new Series with the values resulting from this function applied on each row and the same MultiIndex.

See this discussion, seems like x.name is what you are looking for stackoverflow.com/questions/26658240/… — Pablo Jadzinsky
– Pablo Jadzinsky, Commented Dec 3, 2015 at 17:13
@PabloJadzinsky That discussion is about DataFrame not for Series I think — vishalv2050
– vishalv2050, Commented Apr 20, 2020 at 7:33

Dan Allan · Accepted Answer · 2013-08-19 14:52:38Z

61

I don't believe apply has access to the index; it treats each row as a numpy object, not a Series, as you can see:

In [27]: s.apply(lambda x: type(x))
Out[27]: 
a  b
1  2    <type 'numpy.float64'>
3  6    <type 'numpy.float64'>
4  4    <type 'numpy.float64'>

To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.

Series(s.reset_index().apply(f, axis=1).values, index=s.index)

Other approaches might use s.get_level_values, which often gets a little ugly in my opinion, or s.iterrows(), which is likely to be slower -- perhaps depending on exactly what f does.

answered Aug 19, 2013 at 14:52

Dan Allan

35.5k6 gold badges72 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Andy Hayden Over a year ago

Also worth noting that vectorising f, and using & | etc., may also be faster.

elyase Over a year ago

Currently I use the reset_index approach, will hold a little to see if someone proposes a cleaner solution.

Phillip Cloud Over a year ago

+1 For getting rid of the MultiIndex. While these are occasionally useful, more and more I find myself turning my indices into columns.

Christophe Over a year ago

In my case (a dataframe, with axis=1), x.name() returns the value of the index when I apply a function lambda x: x ...

meow Over a year ago

Which is totally moronic behaviour but ye, what you say is completely right, however your solution is not ideal, for most use cases Jeff's answer DataFrame(s).apply(x) is much more straightforward and should be the accepted answer IMHO!

|

Jeff · Accepted Answer · 2013-08-19 15:04:10Z

20

Make it a frame, return scalars if you want (so the result is a series)

Setup

In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])

In [12]: s
Out[12]: 
a    1
b    2
c    3
dtype: float64

Printing function

In [13]: def f(x):
    print type(x), x
    return x
   ....: 

In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
Out[14]: 
   0
a  1
b  2
c  3

Since you can return anything here, just return the scalars (access the index via the name attribute)

In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]: 
a    5
b    2
c    3
dtype: float64

answered Aug 19, 2013 at 15:04

Jeff

130k21 gold badges223 silver badges189 bronze badges

2 Comments

dashesy Over a year ago

so when calling apply on DataFrame its index will be accessible through name of each series? I see this also is true for DateTimeIndex but it is a little weird to use something similar to x.name == Time(2015-06-27 20:08:32.097333+00:00)

Thomas Kimber Over a year ago

This should be the answer, adopting x.name is the cleanest and most flexible way of addressing the problem.

nehz · Accepted Answer · 2017-12-05 03:54:31Z

15

Convert to DataFrame and apply along row. You can access the index as x.name. x is also a Series now with 1 value

s.to_frame(0).apply(f, axis=1)[0]

answered Dec 5, 2017 at 3:54

nehz

2,2023 gold badges24 silver badges37 bronze badges

Comments

Andy Hayden · Accepted Answer · 2013-08-19 15:51:26Z

3

You may find it faster to use where rather than apply here:

In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])

In [12]: s.where(s.index != 'a', 5)
Out[12]: 
a    5
b    2
c    3
dtype: float64

Also you can use numpy-style logic/functions to any of the parts:

In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]: 
a   -1
b    5
c    7
dtype: float64

In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]: 
a   -1
b    5
c    7
dtype: float64

I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that applys are more readable...

answered Aug 19, 2013 at 15:51

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

5 Comments

Phillip Cloud Over a year ago

Hm. Now I wonder if there should be a Series.eval/query method...I'll bring this up over at pandas.

elyase Over a year ago

@PhillipCloud, +1, I need to use indices a lot(add/subs, aligns and missing data) and this would be great to have.

Phillip Cloud Over a year ago

I'm finding increasingly more often that if I convert my MultiIndexes to columns I'm much happier and life is easier. There's so much more you can do with columns in a DataFrame than a Series with a MultiIndex, in fact they are essentially the same thing, except queries will be faster in the DataFrame columns than in the Series-with-MultiIndex.

Andy Hayden Over a year ago

@PhillipCloud I'm the same, they should really be first class citizens (rather than the opposite).

user1201614 Over a year ago

This doesn't answer the question "Access index in pandas.Series.apply"

Vladimir Leontiev · Accepted Answer · 2015-06-16 23:22:00Z

0

You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().

def f1(row):
    if row['I'] < 0.5:
        return 0
    else:
        return 1

def f2(row):
    if row['N1']==1:
        return 0
    else:
        return 1

import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)

answered Jun 16, 2015 at 23:22

Vladimir Leontiev

11

Comments

waterproof · Accepted Answer · 2019-01-03 16:31:27Z

Use reset_index() to convert the Series to a DataFrame and the index to a column, and then apply your function to the DataFrame.

The tricky part is knowing how reset_index() names the columns, so here are a couple of examples.

With a Singly Indexed Series

s=pd.Series({'idx1': 'val1', 'idx2': 'val2'})

def use_index_and_value(row):
    return 'I made this with index {} and value {}'.format(row['index'], row[0])

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# The new Series has an auto-index;
# You'll want to replace that with the index from the original Series
s2.index = s.index
s2

Output:

idx1    I made this with index idx1 and value val1
idx2    I made this with index idx2 and value val2
dtype: object

With a Multi-Indexed Series

Same concept here, but you'll need to access the index values as row['level_*'] because that's where they're placed by Series.reset_index().

s=pd.Series({
    ('idx(0,0)', 'idx(0,1)'): 'val1',
    ('idx(1,0)', 'idx(1,1)'): 'val2'
})

def use_index_and_value(row):
    return 'made with index: {},{} & value: {}'.format(
        row['level_0'],
        row['level_1'],
        row[0]
    )

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# Replace auto index with the index from the original Series
s2.index = s.index
s2

Output:

idx(0,0)  idx(0,1)    made with index: idx(0,0),idx(0,1) & value: val1
idx(1,0)  idx(1,1)    made with index: idx(1,0),idx(1,1) & value: val2
dtype: object

If your series or indexes have names, you will need to adjust accordingly.

Felix Leipold · Accepted Answer · 2022-09-08 14:54:14Z

0

Series implements the items() method, which enables the use of list comprehensions to map keys (i.e. index values) and values.

Given a series:

In[1]: seriesA = pd.Series([4, 2, 3, 7, 9], name="A")
In[2]: seriesA
Out[2]:
0    4
1    2
2    3
3    7
4    9
dtype: int64

Now, assume function f that takes a key and a value:

def f(key, value):
    return key + value

We can now create a new series by using a for comprehension:

In[1]: pd.Series(data=[f(k,v) for k, v in seriesA.items()], index=seriesA.index)
Out[1]:
0     4
1     3
2     5
3    10
4    13
dtype: int64

Of course this doesn't take advantage of any numpy performance goodness, but for some of operations it makes sense.

answered Sep 8, 2022 at 14:54

Felix Leipold

1,11310 silver badges17 bronze badges

Comments

Lazy-Panda-Bear · Accepted Answer · 2023-09-30 03:09:51Z

Another dirty solution is by using regex.

first, reset the index to create a dataframe.

df = s.reset_index()

df
a   b   values
0   1   2   0.1
1   3   6   0.3
2   4   4   0.7

Then create a column with concatenated columns and indexes as follows: just make sure to use a separator that can be easily separated during pattern recognition. In my case, I use 'first_wall' and 'second_wall'

concatenated_series = df['a'].astype(str)+'first_wall'+df['b'].astype(str)+'second_wall'+df['values'].astype(str)

concatenated_series

0    1first_wall2second_wall0.1
1    3first_wall6second_wall0.3
2    4first_wall4second_wall0.7
dtype: object

Then Create the function

def f(x):
   first_index = int(re.search('^(.+)first_wall', x).group(1))
   second_index = int(re.search('first_wall(.+)second_wall', x).group(1))
   value = float(re.search(r'second_wall(.+)$',x).group(1))
   #do something and whatever you like
   return first_index + second_index + value

Then apply it to the concatenated series.

concatenated_series.apply(f)

0    3.1
1    9.3
2    8.7
dtype: float64

Cheers!

Collectives™ on Stack Overflow

Access index in pandas.Series.apply

8 Answers 8

6 Comments

2 Comments

Comments

5 Comments

Comments

With a Singly Indexed Series

With a Multi-Indexed Series

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

6 Comments

2 Comments

Comments

5 Comments

Comments

With a Singly Indexed Series

With a Multi-Indexed Series

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related