SciPy NumPy and SciKit-learn , create a sparse matrix

Question

I'm currently trying to classify text. My dataset is too big and as suggested here, I need to use a sparse matrix. My question is now, what is the right way to add an element to a sparse matrix? Let's say for example I have a matrix X which is my input .

X = np.random.randint(2, size=(6, 100))

Now this matrix X looks like an ndarray of an ndarray (or something like that).

If I do

X2 = csr_matrix(X)

I have the sparse matrix, but how can I add another element to the sparce matrix ? for example this dense element: [1,0,0,0,1,1,1,0,...,0,1,0] to a sparse vector, how do I add it to the sparse input matrix ?

(btw, I'm very new at python, scipy,numpy,scikit ... everything)

You should really read this: scikit-learn.org/dev/auto_examples/… — zenpoy
– zenpoy, Commented Dec 6, 2012 at 11:11
This is my second day working with python, that's a bit over the top for a second day to read. I found that too btw — Olivier_s_j
– Olivier_s_j, Commented Dec 6, 2012 at 11:13
Some things simply take their time. Maybe you should invest some time into doing some tutorials on Python, Numpy, and Scipy. For example, in the answer in the other question I pointed you to some links, and zenpoy gave you another one. I assume you didn't read those links, since you posted this question mere minutes after I answered the other one. — HerrKaputt
– HerrKaputt, Commented Dec 6, 2012 at 11:16
I did read those, I even made a dummy example which works. Though the updating of the sparse matrix was not something I could find. If you don't know, I don't expect you to answer. — Olivier_s_j
– Olivier_s_j, Commented Dec 6, 2012 at 11:20
@Ojtwist - You tagged your question under sklearn, so these are the answers you get. if you just asked how to concatenate two csr_matrix's you would get totally different answers... — zenpoy
– zenpoy, Commented Dec 6, 2012 at 12:01

zenpoy · Accepted Answer · 2019-05-15 12:40:25Z

14

Scikit-learn has a great documentation, with great tutorials that you really should read before trying to invent it yourself. This one is the first one to read it explains how to classify text, step-by-step, and this one is a detailed example on text classification using sparse representation.

Pay extra attention to the parts where they talk about sparse representations, in this section. In general, if you want to use svm with linear kernel and you large amount of data, LinearSVC (which is based on Liblinear) is better.

Regarding your question - I'm sure there are many ways to concatenate two sparse matrices (btw this is what you should look for in google for other ways of doing it), here is one, but you'll have to convert from csr_matrix to coo_matrix which is anther type of sparse matrix: Is there an efficient way of concatenating scipy.sparse matrices?.

EDIT: When concatenating two matrices (or a matrix and an array which is a 1 dimenesional matrix) the general idea is to concatenate X1.data and X2.data and manipulate their indices and indptrs (or row and col in case of coo_matrix) to point to the correct places. Some sparse representations are better for specific operations and more complex for other operations, you should read about csr_matrix and see if this is the best representation. But I really urge you to start from those tutorials I posted above.

edited May 15, 2019 at 12:40

answered Dec 6, 2012 at 11:14

zenpoy

20.3k10 gold badges65 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Fred Foo Over a year ago

If you want to fit an SVM to a really large set of data, then SGDClassifier is even better. Under default settings, it approximates a linear SVM.

Olga Botvinnik Over a year ago

Link update for the first paragraph: scikit-learn.org/stable/auto_examples/text/…

Rhubbarb Over a year ago

The second link from zenpoy and the update from @olga-botvinnik are both broken now. Try: scikit-learn.org/0.19/auto_examples/text/…

Collectives™ on Stack Overflow

SciPy NumPy and SciKit-learn , create a sparse matrix

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related