I'm wondering how to ensure that all rows in a dataframe contain a particular set of values.
For example:
VALUES = [1, 2]
df_no = pd.DataFrame(
{
"a": [1],
"b": [1],
}
)
df_yes = pd.DataFrame(
{
"a": [1],
"b": [2],
"c": [3],
}
)
Here df_no doesn't contain values of VALUES in each of its rows, whereas df_yes does.
An approach is the following:
# check df_no
all(
[
all(value in row for value in VALUES)
for row in df_no.apply(lambda x: x.unique(), axis=1)
]
)
# returns False
# check df_yes
all(
[
all(value in row for value in VALUES)
for row in df_yes.apply(lambda x: x.unique(), axis=1)
]
)
# returns True
I feel as though the approaches here might be so clear, and that there might be a more idiomatic way of going about things.