10

I use pandas DataFrame with hierarhical index, and in one particular case it is indexed by float values.

Here is example:

example_data = [
    {'a': 1.2, 'b':30, 'v':123},
    {'a': 1.2, 'b':60, 'v':1234},
    {'a': 3, 'b':30, 'v':12345},
    {'a': 3, 'b':60, 'v':123456},
]
frame = pd.DataFrame(example_data)
frame.set_index(['a', 'b'])

Now I'd like to use partial indexing to select frame with a==1.2 and then display it. Documentation shows how to do this for string index, but this approach obviously doesn't work for floats, irrevelant whether I try frame.loc[1.2] i get error about 1.2 being imporper for Int64Index which is obviously true since i use float for indexing.

Is there any way to work with float index in pandas? How can I fix my Hierarhical Index?

Actual error message was:

TypeError: the label [1.2] is not a proper indexer for this index type (Int64Index)
6
  • @TonyHopkinson unless you actually do math on them floats are as safe as it gets for labeling stuff. In this case these floats are labeling inherent physical properties of the process that generated results in this particular data frame. Commented Jul 4, 2014 at 15:20
  • 1
    Really? So Hash[1 - (1/3*3)] != Hash[0] was my point, but even without arithmetic, there will be a huge range values for the keys that will give potentially unfortunate results. I'd avoid this at all costs personally. if precision is to decimal place, I'd multiply it by 10 and truncate maybe. Commented Jul 4, 2014 at 15:27
  • As I've said: I dont do actual math on these floats, they are just labels. Truncating them will be done when I'm displaying results. Commented Jul 4, 2014 at 15:37
  • All I can say is if I was reviewing your code, you'd get it back. Too many ways it could go wrong on you. Commented Jul 4, 2014 at 15:45
  • 2
    If this was an commercial project this would have been the case, but it is not. See: academia.stackexchange.com/questions/21276/… Commented Jul 4, 2014 at 16:19

2 Answers 2

4

Pandas has no issue if the index level is a single level so not a multi index:

In [178]:

frame = frame.set_index(['a'])
frame.loc[1.2]
Out[178]:
      b     v
a            
1.2  30   123
1.2  60  1234

If you do have a multi-index then you can get generate a mask using the index level 0 (the first) and use this to select the values:

In [180]:

mask = frame.index.get_level_values(0)
frame.loc[mask == 1.2]
Out[180]:
           v
a   b       
1.2 30   123
    60  1234

The mask itself contains all the level 0 values for each row:

In [181]:

mask
Out[181]:
Float64Index([1.2, 1.2, 3.0, 3.0], dtype='float64')

It is better and more explicit to specify the level using the name:

mask = frame.index.get_level_values('a')
Sign up to request clarification or add additional context in comments.

3 Comments

As mentioned in earlier comments, the reason is likely floating point precision. A possible workaround is to use a string representation instead. Truncation or rounding will probably be needed to do this properly, though, and that may pose a problem when attempting to get data out of the data frame.
@Ethan sure, in principal it should work for values like 1.0...2.0 etc.. it may fail for others like you say, testing floating values for equality and comparison is always a little tricky, string representaion would work but I'd question the need for a floatindex in the first place
I don't understand why MultiIndex is a special case. For a MultiIndex, you could definitely use loc directly. If the level you're indexing is the first one, the following are equivalent: frame.loc[1.2] or frame.loc[(1.2,)] or frame.loc[(1.2, slice(None))]. Then the only potential problem is float precision. Am I missing something?
1

Came across this while trying something similar and it worked without issue. Either the pandas library has improved, or you are missing inplace (or assignment) in set_index.

example_data = [
    {'a': 1.2, 'b':30, 'v':123},
    {'a': 1.2, 'b':60, 'v':1234},
    {'a': 3, 'b':30, 'v':12345},
    {'a': 3, 'b':60, 'v':123456},
]
frame = pd.DataFrame(example_data)
f2 = frame.set_index(['a', 'b']) # <<<<<<<<<
print(f2)
             v
a   b         
1.2 30     123
    60    1234
3.0 30   12345
    60  123456

Now f2.loc[1.2] works.

print(f2.loc[1.2])
       v
b       
30   123
60  1234

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.