2

I have a dataframe that looks like this: foo = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'c': [6,7,8]}) and a list of columns list_of_columns = ['a','b'] of foo.

The list_of_columns are dynamically selected by the user, so it can be ['a','b'] but it can also be ['a','c'] or ['c'] or ['a','b','c'] etc

I would like for every column in the list_of_columns to create (nested) for loops and query the dataframe in the following way:

In case list_of_columns = ['a','b'] the the loop would be like this:

for a in foo.a.unique():
    for b in foo.b.unique():
        print(foo.query(f'a=={a} and b=={b}'))

In case list_of_columns = ['a'] the the loop would be like this:

for a in foo.a.unique():
    print(foo.query(f'a=={a}'))

In case list_of_columns = ['a','b','c'] the the loop would be like this:

for a in foo.a.unique():
    for b in foo.b.unique():
        for c in foo.c.unique():
            print(foo.query(f'a=={a} and b=={b} and c=={c}'))

Is there a way to programmatically achieve that in python ?

2
  • 2
    Your input DataFrame is invalid, do you have a NaN? Commented Jul 20, 2022 at 8:46
  • recursive function with a single loop. You remove 'a' from list and call this function recursively with smaller list for every element in 'a'. When the list is empty print Commented Jul 20, 2022 at 8:47

2 Answers 2

2

One approach using itertools.product, to handle the "nested" loops:

import pandas as pd
from itertools import product

foo = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [6, 7, 8]})

list_of_columns = ['a', 'b']
for p in product(*(foo[col].unique() for col in list_of_columns)):
    query = " and ".join(f"{c}=={u}" for c, u in zip(list_of_columns, p))
    print(foo.query(query))
    print("--")

Output

   a  b  c
0  1  4  6
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
   a  b  c
1  2  5  7
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
Empty DataFrame
Columns: [a, b, c]
Index: []
--
   a  b  c
2  3  6  8
--
Sign up to request clarification or add additional context in comments.

Comments

1

Essentially, it looks like you want to loop over the unique combinations?

But as you then query for valid ones, you obtain a lot of empty DataFrames. If you do not need those, a much simpler and more efficient version would be:

for _,g in foo.groupby(list_of_columns):
    print('---')
    print(g)

output:

---
   a  b  c
0  1  4  6
---
   a  b  c
1  2  5  7
---
   a  b  c
2  3  6  8

In comparison, the output of your nested loop:

---
   a  b  c
0  1  4  6
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
   a  b  c
1  2  5  7
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
Empty DataFrame
Columns: [a, b, c]
Index: []
---
   a  b  c
2  3  6  8

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.