2

I have an a pandas series with an array as value for each like so:

             'Node'
    ..        ....
    97     [355.0, 296.0]
    98      [53.0, 177.0]
    99      [294.0, 14.0]
    100     [330.0, 15.0]
    101    [100.0, 160.0]
    102     [10.0, 220.0]
    103    [330.0, 290.0]

I want to find the index of all the rows that contain the value 330.0, which would be 100 and 103.

What I have tried until now is:

vals = [item for item in df.Node if item[0] == 330.0]

which gives me [array([ 330., 15.]), array([ 330., 290.])]

and then:

for val in vals:
    id = pd.Index(df.Node).get_loc(val)

This throws an error saying TypeError: '[ 330. 15.]' is an invalid key

How do I solve this and get the row index of the value?

Edit : Here's a sample dataframe with much fewer rows.

0     [139.0, 105.0]
1     [290.0, 200.0]
2     [257.0, 243.0]
3       [235.0, 7.0]
4      [12.0, 115.0]
5     [168.0, 135.0]
6     [105.0, 258.0]
7      [339.0, 64.0]
8       [6.0, 148.0]
9      [33.0, 286.0]
10      [62.0, 26.0]
11    [307.0, 185.0]
12     [34.0, 269.0]
13     [206.0, 60.0]
14    [327.0, 127.0]
15    [127.0, 202.0]
16     [297.0, 48.0]
17    [131.0, 151.0]
18      [326.0, 1.0]
19     [304.0, 35.0]
20     [329.0, 23.0]
21    [314.0, 287.0]
22      [1.0, 233.0]
23    [260.0, 280.0]
24     [313.0, 56.0]
25     [294.0, 33.0]
26    [243.0, 256.0]
27    [151.0, 174.0]
28    [271.0, 295.0]
29    [141.0, 184.0]
30    [105.0, 157.0]
31    [288.0, 269.0]
32    [118.0, 210.0]
33     [38.0, 194.0]
34     [49.0, 154.0]
35     [40.0, 204.0]
36     [317.0, 27.0]
37     [359.0, 33.0]
38     [56.0, 184.0]
39     [359.0, 39.0]
40     [48.0, 170.0]
41     [314.0, 51.0]
42    [175.0, 184.0]
43     [28.0, 200.0]
44     [35.0, 169.0]
45     [330.0, 15.0]
46    [100.0, 160.0]
47     [10.0, 220.0]
48    [330.0, 290.0]
Name: Node, dtype: object
4
  • could you provide a well-formatted sample of your dataframe so we can test some code? but it looks like your problem is due to the list format. Commented Sep 12, 2017 at 16:12
  • 1
    The reason why you get error, is because list is not hashable in Python. Commented Sep 12, 2017 at 16:12
  • @MattR updating the question now with a dataframe with fewer rows. Commented Sep 12, 2017 at 16:18
  • I would strongly consider using @Alexander's answer. Its important to put your data into the correct format for data analysis. Keeping multiple values in a single cell is not really an optimal way to store data. Convert to a proper DataFrame with two columns first and then do normal boolean selection. Commented Sep 12, 2017 at 18:48

5 Answers 5

3

One more:)

df.index[df['Node'].apply(lambda x: 330.0 in x )].tolist()

You get

[100, 103]

This one also seems to be the fastest

%timeit df.index[df['Node'].apply(lambda x: 330.0 in x )].tolist()
1000 loops, best of 3: 262 µs per loop

%timeit df[df.Node.apply(lambda x: True if 330.0 in x else False)].index 
1000 loops, best of 3: 704 µs per loop

%timeit df.loc[(df['x'] == 330) | (df['y'] == 330), 'Node']
1000 loops, best of 3: 1.3 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, this seems to be the solution to what I want!
Just for future readers' reference - if there is ever a time where Node contains a value that is not a list, a user will get a Type error: TypeError: argument of type 'int' is not iterable. this is basically the opposite of the OP's original issue. This solution, as it currently stands, only works for a column of lists.
2

A key question is why the column contains a list of tuples in the first place. This would be stored as an object data type, your least efficient option. You should probably split your lists into two separate columns (which would be np.float64 given your sample data) and then check the values.

df = pd.DataFrame({'Node': [
    [355., 296.], 
    [53., 177.], 
    [294., 14.], 
    [330., 15.], 
    [100., 160.],
    [10., 220.],
    [330., 290.]]}, index=range(97, 104))

df[['x', 'y']] = df.Node.apply(pd.Series)
>>> df.loc[(df['x'] == 330) | (df['y'] == 330), 'Node']
100     [330.0, 15.0]
103    [330.0, 290.0]
Name: Node, dtype: object

Comments

1

You can get what you want with

df[df.Node.apply(lambda x: True if 330.0 in x else False)].index 

Full Example:

>>> import pandas as pd 
>>> df = pd.DataFrame({'Node': [
...     [355., 296.], 
...     [53., 177.], 
...     [294., 14.], 
...     [330., 15.], 
...     [100., 160.],
...     [10., 220.],
...     [330., 290.]]}, index=range(97, 104))
>>> df
               Node
97   [355.0, 296.0]
98    [53.0, 177.0]
99    [294.0, 14.0]
100   [330.0, 15.0]
101  [100.0, 160.0]
102   [10.0, 220.0]
103  [330.0, 290.0]
>>> df[df.Node.apply(lambda x: True if 330.0 in x else False)]
               Node
100   [330.0, 15.0]
103  [330.0, 290.0]
>>> df[df.Node.apply(lambda x: True if 330.0 in x else False)].index 
Int64Index([100, 103], dtype='int64')
>>> 
>>> df[df.Node.apply(lambda x: True if 330.0 in x else False)].index.tolist()  
[100, 103]
>>> 

Comments

0

How about this:

import pandas as pd

df = pd.DataFrame()
df['Node'] = [[1, 2], [1, 3], [330.0, 5]]

for idx, value in enumerate(df['Node']):
    if 330.0 in value:
        print(idx)

Comments

0

avoid the loops in pandas. use .loc:

an example:

df.loc[df['Node'] == 330.0].index.tolist()

This will give you a list of the indexes where 'Node' is equal to 330. You may need to change it a bit. look at this SO answer to know how to use lambda expressions with pandas to help you with lists

EDIT:

I left a comment that stated that unless the entire Node Column contains values that are lists, the accepted answer will fail. A diry wokr-around is to make the values string and use contains. you can try something like:

df.loc[df['Node'].astype(str).str.contains('330.0')].index.tolist()

This makes the list a string and then you can check if it contains the string of 330.0

4 Comments

Thanks for the answer but I get an error saying ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Thanks, that SO link helped!
My pleasure, I was going to update my answer similar to the one you already accepted. It's the best answer.
@PVasish, I updated my answer just in case you ever have an issue of where Node contains anything except a list. I've run into this problem before and gave you a simple way of working around it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.