1

In python pandas, I'm wondering if there is a builtin function that does the same as df_to_dict below. Speed is of the essence, as my dataframe can have thousands of rows. Basically return a dictionary with keys being the set of values in df[column] and the values being the rest of the dataframe corresponding to where df[column] == value.


def df_to_dict(df, column):
    def _filter(df, column, value):
        df_return = df[df[column] == value]
        del df_return[column]
        return df_return

    return {value: _filter(df, column, value) for value in set(df[column].values)}

1 Answer 1

5

The most straightforward idea I can think of is converting a groupby to a dict.

def df_to_dict_original(df, column):
    return {key: value.drop(columns=[column]) for key, value in df.groupby(column)}

Benchmark:

import pandas as pd
csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# using the attribute information as the column names
col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Class']
iris = pd.read_csv(csv_url, names = col_names)
# Create dataframe with 15000 rows
iris_big = pd.concat([iris]*100)

%timeit as_dict = df_to_dict_original(iris, "Class")
# 1.13 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit as_dict = df_to_dict_groupby(iris, "Class")
# 1.18 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit as_dict = df_to_dict_original(iris_big, "Class")
# 7.73 ms ± 152 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit as_dict = df_to_dict_groupby(iris_big, "Class")
# 2.82 ms ± 8.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It would appear to be about the same speed as your original idea, except when dealing with a big dataframe, where it is about 2x faster. Your mileage may vary: results will vary based on the cardinality of the column and size of the dataframe.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.