1

So I'm trying to pass a variable operation (user defined) into a function and am having trouble trying to find a good way of doing it. All I can think of to do is hard code all the options into the function like the following:

def DoThings(Conditions):
import re
import pandas as pd
d = {'time' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd']),
     'legnth' : pd.Series([4., 5., 6., 7.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df

for Condition in Conditions:
    # Split the condition into two parts
    SplitCondition = re.split('<=|>=|!=|<|>|=',Condition)

    # If the right side of the conditional statement is a number convert it to a float
    if SplitCondition[1].isdigit():
        SplitCondition[1] = float(SplitCondition[1])

    # Perform the condition specified
    if "<=" in Condition:
        df = df[df[SplitCondition[0]]<=SplitCondition[1]]
        print "one"
    elif ">=" in Condition:
        df = df[df[SplitCondition[0]]>=SplitCondition[1]]
        print "two"
    elif "!=" in Condition:
        df = df[df[SplitCondition[0]]!=SplitCondition[1]]
        print "three"
    elif "<" in Condition:
        df = df[df[SplitCondition[0]]<=SplitCondition[1]]
        print "four"
    elif ">" in Condition:
        df = df[df[SplitCondition[0]]>=SplitCondition[1]]
        print "five"
    elif "=" in Condition:
        df = df[df[SplitCondition[0]]==SplitCondition[1]]
        print "six"
return df

# Specify the conditions
Conditions = ["time>2","legnth<=6"]
df = DoThings(Conditions)   # Call the function

print df

Which results in this:

   legnth  time
a       4     1
b       5     2
c       6     3
d       7     4
five
one
   legnth  time
c       6     3

This is all well and good and everything, but I'm wondering if there is a better or more efficient way of passing conditions into functions without writing all the if statements possible out. Any ideas?

SOLUTION:

def DoThings(Conditions):
    import re
    import pandas as pd
    d = {'time' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd']),
         'legnth' : pd.Series([4., 5., 6., 7.], index=['a', 'b', 'c', 'd'])}
    df = pd.DataFrame(d)
    print df

    for Condition in Conditions:
        # Split the condition into two parts
        SplitCondition = re.split('<=|>=|!=|<|>|=',Condition)

        # If the right side of the conditional statement is a number convert it to a float
        if SplitCondition[1].isdigit():
            SplitCondition[1] = float(SplitCondition[1])

        import operator
        ops = {'<=': operator.le, '>=': operator.ge, '!=': operator.ne, '<': operator.lt, '>': operator.gt, '=': operator.eq}
        cond = re.findall(r'<=|>=|!=|<|>|=', Condition)
        df = df[ops[cond[0]](df[SplitCondition[0]],SplitCondition[1])]

    return df



# Specify the conditions
Conditions = ["time>2","legnth<=6"]
df = DoThings(Conditions)   # Call the function

print df

Output:

   legnth  time
a       4     1
b       5     2
c       6     3
d       7     4
   legnth  time
c       6     3

3 Answers 3

4

You can access the built-in operators via the operator module, and then build a table mapping your operator names to the built-in ones, like in this cut-down example:

import operator
ops = {'<=': operator.le, '>=': operator.ge}

In [3]: ops['>='](2, 1)
Out[3]: True
Sign up to request clarification or add additional context in comments.

4 Comments

This is exactly what I was looking for. Thanks. Answer implemented in question.
Although this answers how the OP (or anyone else who stumbles here) could do this... they really shouldn't be rolling out their own (less efficient) numpy masking.
@AndyHayden: "Anyone else who stumbles here" might not be using numpy - they might just be looking for a succinct way to implement a set of operators. Harsh downvote IMHO.
Take back, I still think the OP creating a DSL to do this is a bad idea though...
2

You can use masking to do this kind of operation (you will find it a lot faster):

In [21]: df[(df.legnth <= 6) & (df.time > 2)]
Out[21]:
   legnth  time
c       6     3

In [22]: df[(df.legnth <= 6) & (df.time >= 2)]
Out[22]:
   legnth  time
b       5     2
c       6     3

Note: there's a bug in your implementation, since b should not be included in your query.

You can also do or (using |) operations, which work as you would expect:

In [23]: df[(df.legnth == 4) | (df.time == 4)]
Out[23]:
   legnth  time
a       4     1
d       7     4

3 Comments

And how would you implement this in the example above?
In pandas 0.13 you'll be able to do df['legnth == 4 or time == 4'] courtesy of the query method on DataFrame objects and pd.eval, a top-level evaluator that uses numexpr under the hood.
@cpcloud you should definitely add a new answer with this. :) very awesome!
0

In pandas==0.13 (not sure when the release for that will be...0.12 just came out) you'll be able to do the following, all of which are equivalent:

res = df.query('(legnth == 4) | (time == 4)')
res = df.query('legnth == 4 | time == 4')
res = df.query('legnth == 4 or time == 4')

and my personal favorite

res = df['legnth == 4 or time == 4']

query and __getitem__ both accept an arbitrary boolean expression and automatically "prefix" the calling frame instance on each variable name in the expression (you can also use locals and globals as well). This allows you to 1) express queries a bit more succinctly than typing df. in front of everything 2) express queries using syntax that, let's face it, looks better than ugly bitwise operators, 3) is potentially much faster than the "pure" Python equivalent if you have huge frames and a very complex expression, and finally 4) allows you to pass the same query to multiple frames (after all, it is a string) with a subset of columns in common.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.