Python Pandas: Finding the index of a pandas series with an array value

Question

I have an a pandas series with an array as value for each like so:

             'Node'
    ..        ....
    97     [355.0, 296.0]
    98      [53.0, 177.0]
    99      [294.0, 14.0]
    100     [330.0, 15.0]
    101    [100.0, 160.0]
    102     [10.0, 220.0]
    103    [330.0, 290.0]

I want to find the index of all the rows that contain the value 330.0, which would be 100 and 103.

What I have tried until now is:

vals = [item for item in df.Node if item[0] == 330.0]

which gives me [array([ 330., 15.]), array([ 330., 290.])]

and then:

for val in vals:
    id = pd.Index(df.Node).get_loc(val)

This throws an error saying TypeError: '[ 330. 15.]' is an invalid key

How do I solve this and get the row index of the value?

Edit : Here's a sample dataframe with much fewer rows.

0     [139.0, 105.0]
1     [290.0, 200.0]
2     [257.0, 243.0]
3       [235.0, 7.0]
4      [12.0, 115.0]
5     [168.0, 135.0]
6     [105.0, 258.0]
7      [339.0, 64.0]
8       [6.0, 148.0]
9      [33.0, 286.0]
10      [62.0, 26.0]
11    [307.0, 185.0]
12     [34.0, 269.0]
13     [206.0, 60.0]
14    [327.0, 127.0]
15    [127.0, 202.0]
16     [297.0, 48.0]
17    [131.0, 151.0]
18      [326.0, 1.0]
19     [304.0, 35.0]
20     [329.0, 23.0]
21    [314.0, 287.0]
22      [1.0, 233.0]
23    [260.0, 280.0]
24     [313.0, 56.0]
25     [294.0, 33.0]
26    [243.0, 256.0]
27    [151.0, 174.0]
28    [271.0, 295.0]
29    [141.0, 184.0]
30    [105.0, 157.0]
31    [288.0, 269.0]
32    [118.0, 210.0]
33     [38.0, 194.0]
34     [49.0, 154.0]
35     [40.0, 204.0]
36     [317.0, 27.0]
37     [359.0, 33.0]
38     [56.0, 184.0]
39     [359.0, 39.0]
40     [48.0, 170.0]
41     [314.0, 51.0]
42    [175.0, 184.0]
43     [28.0, 200.0]
44     [35.0, 169.0]
45     [330.0, 15.0]
46    [100.0, 160.0]
47     [10.0, 220.0]
48    [330.0, 290.0]
Name: Node, dtype: object

could you provide a well-formatted sample of your dataframe so we can test some code? but it looks like your problem is due to the list format. — MattR
– MattR, Commented Sep 12, 2017 at 16:12
The reason why you get error, is because list is not hashable in Python. — Menglong Li
– Menglong Li, Commented Sep 12, 2017 at 16:12
@MattR updating the question now with a dataframe with fewer rows. — PVasish
– PVasish, Commented Sep 12, 2017 at 16:18
I would strongly consider using @Alexander's answer. Its important to put your data into the correct format for data analysis. Keeping multiple values in a single cell is not really an optimal way to store data. Convert to a proper DataFrame with two columns first and then do normal boolean selection. — Ted Petrou
– Ted Petrou, Commented Sep 12, 2017 at 18:48

Vaishali · Accepted Answer · 2017-09-12 16:24:19Z

3

One more:)

df.index[df['Node'].apply(lambda x: 330.0 in x )].tolist()

You get

[100, 103]

This one also seems to be the fastest

%timeit df.index[df['Node'].apply(lambda x: 330.0 in x )].tolist()
1000 loops, best of 3: 262 µs per loop

%timeit df[df.Node.apply(lambda x: True if 330.0 in x else False)].index 
1000 loops, best of 3: 704 µs per loop

%timeit df.loc[(df['x'] == 330) | (df['y'] == 330), 'Node']
1000 loops, best of 3: 1.3 ms per loop

answered Sep 12, 2017 at 16:24

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

PVasish Over a year ago

Thanks, this seems to be the solution to what I want!

MattR Over a year ago

Just for future readers' reference - if there is ever a time where Node contains a value that is not a list, a user will get a Type error: TypeError: argument of type 'int' is not iterable. this is basically the opposite of the OP's original issue. This solution, as it currently stands, only works for a column of lists.

Alexander · Accepted Answer · 2017-09-12 16:31:06Z

2

A key question is why the column contains a list of tuples in the first place. This would be stored as an object data type, your least efficient option. You should probably split your lists into two separate columns (which would be np.float64 given your sample data) and then check the values.

df = pd.DataFrame({'Node': [
    [355., 296.], 
    [53., 177.], 
    [294., 14.], 
    [330., 15.], 
    [100., 160.],
    [10., 220.],
    [330., 290.]]}, index=range(97, 104))

df[['x', 'y']] = df.Node.apply(pd.Series)
>>> df.loc[(df['x'] == 330) | (df['y'] == 330), 'Node']
100     [330.0, 15.0]
103    [330.0, 290.0]
Name: Node, dtype: object

edited Sep 12, 2017 at 16:31

answered Sep 12, 2017 at 16:18

Alexander

111k32 gold badges212 silver badges208 bronze badges

Comments

Mohamed Ali JAMAOUI · Accepted Answer · 2017-09-12 16:42:39Z

You can get what you want with

df[df.Node.apply(lambda x: True if 330.0 in x else False)].index

Full Example:

>>> import pandas as pd 
>>> df = pd.DataFrame({'Node': [
...     [355., 296.], 
...     [53., 177.], 
...     [294., 14.], 
...     [330., 15.], 
...     [100., 160.],
...     [10., 220.],
...     [330., 290.]]}, index=range(97, 104))
>>> df
               Node
97   [355.0, 296.0]
98    [53.0, 177.0]
99    [294.0, 14.0]
100   [330.0, 15.0]
101  [100.0, 160.0]
102   [10.0, 220.0]
103  [330.0, 290.0]
>>> df[df.Node.apply(lambda x: True if 330.0 in x else False)]
               Node
100   [330.0, 15.0]
103  [330.0, 290.0]
>>> df[df.Node.apply(lambda x: True if 330.0 in x else False)].index 
Int64Index([100, 103], dtype='int64')
>>> 
>>> df[df.Node.apply(lambda x: True if 330.0 in x else False)].index.tolist()  
[100, 103]
>>>

Menglong Li · Accepted Answer · 2017-09-12 16:16:37Z

0

How about this:

import pandas as pd

df = pd.DataFrame()
df['Node'] = [[1, 2], [1, 3], [330.0, 5]]

for idx, value in enumerate(df['Node']):
    if 330.0 in value:
        print(idx)

answered Sep 12, 2017 at 16:16

Menglong Li

2,25517 silver badges21 bronze badges

Comments

MattR · Accepted Answer · 2017-09-12 16:48:31Z

0

avoid the loops in pandas. use .loc:

an example:

df.loc[df['Node'] == 330.0].index.tolist()

This will give you a list of the indexes where 'Node' is equal to 330. You may need to change it a bit. look at this SO answer to know how to use lambda expressions with pandas to help you with lists

EDIT:

I left a comment that stated that unless the entire Node Column contains values that are lists, the accepted answer will fail. A diry wokr-around is to make the values string and use contains. you can try something like:

df.loc[df['Node'].astype(str).str.contains('330.0')].index.tolist()

This makes the list a string and then you can check if it contains the string of 330.0

edited Sep 12, 2017 at 16:48

answered Sep 12, 2017 at 16:18

MattR

5,1949 gold badges44 silver badges70 bronze badges

4 Comments

PVasish Over a year ago

Thanks for the answer but I get an error saying ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

PVasish Over a year ago

Thanks, that SO link helped!

MattR Over a year ago

My pleasure, I was going to update my answer similar to the one you already accepted. It's the best answer.

MattR Over a year ago

@PVasish, I updated my answer just in case you ever have an issue of where Node contains anything except a list. I've run into this problem before and gave you a simple way of working around it

Collectives™ on Stack Overflow

Python Pandas: Finding the index of a pandas series with an array value

5 Answers 5

2 Comments

Comments

Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related