I was trying to generate an array of trigrams (i.e. continuous-three-letter combinations) from a super long char array:
# data is actually load from a source file
a = np.random.randint(0, 256, 2**28, 'B').view('c')
Since making copy is not efficient (and it creates problems like cache miss), I directly generated the trigram using stride tricks:
tri = np.lib.stride_tricks.as_strided(a, (len(a) - 2, 3), a.strides * 2)
This generates a trigram list with shape (2**28 - 2, 3) where each row is a trigram. Now I want to convert the trigram to a list of string (i.e. S3) so that numpy displays it more "reasonably" (instead of individual chars).
tri = tri.view('S3')
It gives the exception:
ValueError: To change to a dtype of a different size, the array must be C-contiguous
I understand generally data should be contiguous in order to create a meaningful view, but this data is contiguous at "where it should be": each three elements are contiguous.
So I'm wondering how to view contiguous part in non-contiguous np.ndarray as dtype of bigger size? A more "standard" way would be better, while hackish ways are also welcome. It seems that I can set shape and stride freely with np.lib.stride_tricks.as_strided, but I can't force the dtype to be something, which is the problem here.
EDIT
Non-contiguous array can be made by simple slicing. For example:
np.empty((8, 4), 'uint32')[:, :2].view('uint64')
will throw the same exception above (while from a memory point of view I should be able to do this). This case is much more common than my example above.
np.ascontiguousarray(tri).view('S3')?[:,:2]case there are 2 elements, then a gap, 2 more elements, etc. Look at theflags. Evidentlyviewisn't going the extra step of verifying that the 8 bytes it needs for eachuint64are contiguous.