Efficiently converting pandas dataframe to scipy sparse matrix

Question

I'm trying to convert a pandas Dataframe to a scipy sparse matrix as a way to efficiently work with many features.

However I didn't find an efficient way to access the values in the dataframe, so I always run out of memory when doing the conversion. I tried the two solutions below and they just don't work. I've researched a lot but didn't find anything better. If anyone has a suggestion I'd be happy to test it.

sparse_array = sparse.csc_matrix(df.values)
sparse_array = sparse.csc_matrix(df.to_numpy())

CJR · Accepted Answer · 2020-10-18 21:45:55Z

1

If your dataframe is very sparse you could convert it column-wise and then stack:

from scipy import sparse

sparse_array = sparse.hstack([sparse.csc_matrix(df[i].values.reshape(-1, 1)) for i in df.columns])

But probably best is to just turn it into a sparse dataframe:

for i in df.columns:
    df[i] = df[i].astype(pd.SparseDtype(df[i].dtype))

sparse_array = sparse.csc_matrix(df.sparse.to_coo())

(Note that there may be an issue if your dtypes are not homogeneous throughout the dataframe).

answered Oct 18, 2020 at 21:45

CJR

3,9872 gold badges13 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Diego Pesco Alcalde Over a year ago

Hey CJR thanks for the reply. I tested here and indeed seems that it worked. When you mentioned not homogeneous you mean that I can have an issue if I have floats and integers, for example? If yes, what sort of issue could I have?

CJR Over a year ago

If you're keeping it as a sparse dataframe there's no issue - the scipy sparse matrix is a single dtype though. If you have floats and ints, one will have to turn into the other if you want a matrix. (If you have a column of strings, even worse - now it's a matrix of python objects, but it'll probably crash so good news there)

Collectives™ on Stack Overflow

Efficiently converting pandas dataframe to scipy sparse matrix

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related