I am currently using this method to convert a set of integers (the variable words) to a numpy array:
wordMask = np.asarray( [ int(x not in words) for x in xrange(0,nwords) ] ).reshape(nwords,1)
Here nwords can be as large as 10000.
Instead of recomputing wordMask every time I could keep it as a separate variable and whenever I add/remove an element from words I make the corresponding change to wordMask, but I'm wondering if there is a reasonably efficient way to recompute wordMask.
Edit: My main concern is that the list comprehension:
[ x not in words for x in xrange(0,nwords) ]
will be slow, and I'm looking for a faster way to perform that iteration
in the context of creating a numpy array.
wordslook like? Why do you need.reshape()? Any reason not to us anumpy.ma.array?xis an integer. What iswords?x not in wordswill just be False for everything right?wordsis aset()of integers. I could use theintsetmodule for better performance I guess, but I'm not doing that at the moment. e.g.words = set(); a.add(3); a.add(6)numpy.ma.array. I need a column vector forwordMask. AFAIK,.reshape()is very performant in that it doesn't make a copy of the array.