Is there a function to filter a dataframe by a column value and produce a dictionary?

Question

In python pandas, I'm wondering if there is a builtin function that does the same as df_to_dict below. Speed is of the essence, as my dataframe can have thousands of rows. Basically return a dictionary with keys being the set of values in df[column] and the values being the rest of the dataframe corresponding to where df[column] == value.


def df_to_dict(df, column):
    def _filter(df, column, value):
        df_return = df[df[column] == value]
        del df_return[column]
        return df_return

    return {value: _filter(df, column, value) for value in set(df[column].values)}

Nick ODell · Accepted Answer · 2025-09-16 18:27:50Z

The most straightforward idea I can think of is converting a groupby to a dict.

def df_to_dict_original(df, column):
    return {key: value.drop(columns=[column]) for key, value in df.groupby(column)}

Benchmark:

import pandas as pd
csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# using the attribute information as the column names
col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Class']
iris = pd.read_csv(csv_url, names = col_names)
# Create dataframe with 15000 rows
iris_big = pd.concat([iris]*100)

%timeit as_dict = df_to_dict_original(iris, "Class")
# 1.13 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit as_dict = df_to_dict_groupby(iris, "Class")
# 1.18 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit as_dict = df_to_dict_original(iris_big, "Class")
# 7.73 ms ± 152 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit as_dict = df_to_dict_groupby(iris_big, "Class")
# 2.82 ms ± 8.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It would appear to be about the same speed as your original idea, except when dealing with a big dataframe, where it is about 2x faster. Your mileage may vary: results will vary based on the cardinality of the column and size of the dataframe.

Collectives™ on Stack Overflow

Is there a function to filter a dataframe by a column value and produce a dictionary?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related