4

How can I check if all values of a polars DataFrame, containing only boolean columns, are True?
Example df:

df = pl.DataFrame({"a": [True, True, None],
                   "b": [True, True, True],
    })

The reason for my question is that sometimes I want to check if all values of a df fulfill a condition, like in the following:

df = pl.DataFrame({"a": [1, 2, None],
                   "b": [4, 5, 6],
}).select(pl.all() >= 1)

By the way, I didn't expect that .select(pl.all() >= 1) keeps the null (None) in last row of column "a", maybe that's worth noting.

2 Answers 2

4

As of the date of this edit, I found the following code to be the most idiomatic for polars (also in terms of performance):

df.fold(lambda s1, s2: s1 & s2).all(ignore_nulls=False)

Note that this code can return True, False or None. None (or nothing) is returned when exclusively True values and null values exist.

Example with the df from the question:


>>>df = pl.DataFrame({"a": [True, True, None],
...                "b": [True, True, True],
... })
... df.fold(lambda s1, s2: s1 & s2).all(ignore_nulls=False)  # Nothing is returned because of the `None` in the df.
>>> df = pl.DataFrame({"a": [True, True, True],
...                    "b": [True, True, True],
...     })
... df.fold(lambda s1, s2: s1 & s2).all(ignore_nulls=False)  # True is returned.
True

If no null values exist in df, one could omit ignore_nulls=False.




And to finish off, let me show you the second-best answer, it is less straightforward and a bit slower:
df.mean_horizontal(ignore_nulls=False).eq_missing(1).all()

However, the advantage of this one is that it can only return True or False (no None).
The second-best answer works because the mean of a row with only True values is always 1.

Sign up to request clarification or add additional context in comments.

Comments

2

A more explicit approach could look as follows.

If null values can be ignored:

is_all_true = pl.all_horizontal(pl.all().all())
df.select(is_all_true).item()
True

Explanation. If df is of shape (n, c), then:

  • using pl.all().all() will give a boolean dataframe of shape (1, c) indicating for each column whether it only contains true values;
  • using pl.all_horizontal(pl.all().all()) will give a boolean dataframe of shape (1, 1) indicating whether all values in df are True;
  • finally, .item() is used to pick the literal value from the dataframe of shape (1, 1).

If null values cannot be ignored:

Here, pl.Expr.fill_null is used to explicitly set null values to False before performing the logic above.

is_all_true = pl.all_horizontal(pl.all().fill_null(False).all())
df.select(is_all_true).item()
False

See this answer for more details in the context of checking for null values.

2 Comments

Note. One could also use the shorter form pl.select(pl.all_horizontal(df).all()).item(). However, this uses less common functions (e.g. pl.select instead of pl.DataFrame.select and pl.all_horizontal with a dataframe argument instead of an expression argument).
@jqurious I've opened a feature request here for corresponding frame-level methods Do you know what needs to happen for the request to get accepted? If accepted, I'd love to use it as an opportunity to contribute to polars.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.