2

This post https://stackoverflow.com/a/5541452/6394617

suggests a way to make a Numpy array immutable, using .flags.writeable = False

However, when I test this:

arr = np.arange(20).reshape((4,5))
arr.flags.writeable = False
arr

for i in range(5):
    np.random.shuffle(arr[:,i])

arr

The array is shuffled in place, without even a warning.

QUESTION: Is there a way to make the array immutable?

BACKGROUND:

For context, I'm doing machine learning, and I have feature arrays, X, which are floats, and label arrays, y, which are ints.

I'm new to Scikit-learn, but from what I've read, it seems like the fit methods shuffle the arrays in place. That said, when I created two arrays, fit a model to the data, and inspected the arrays afterwards, they were in the original order. So I'm just not familiar with how Scikit-learn shuffles, and haven't been able to find an easy explanation to that online yet.

I'm using many different models, and doing some preprocessing in between, and I'm worried that at some point my two arrays may get shuffled so that the rows no longer correspond appropriately.

It would give me piece of mind if I could make the arrays immutable. I'm sure I could switch to tuples instead of Numpy arrays, but I suspect that would be more complicated to code and slower.

8
  • 1
    I'm going to mess up the terminology, but arr[:, i] returns something like a "view" of the data, not the array itself. np.random.shuffle(x) will throw an error Commented Dec 17, 2021 at 14:57
  • scikit-learn's fit shouldn't shuffle the columns. If it shuffles anything, it should do the whole row. Commented Dec 17, 2021 at 15:26
  • 1
    @QuangHoang, I know that scikit-learn shuffles by default (rows, not columns), but I was surprised when I called X.flags.writeable = False before clf.fit(X,y) and did not cause any errors, since it seemed to me like fit was going to try to shuffle the data in place, but should not have been able to. So I'm not sure how the scikit-learn library shuffles the data. I haven't dug through every line of source code, and don't really have time to, which is why I was hoping there was some way to just lock the array, in a way that prevented any changes to it. Commented Dec 17, 2021 at 16:22
  • 3
    The problem is not that arr[:, i] is a view, but that it is a one-dimensional array. It looks like the shuffle method does not respect the writeable flag when the input is a 1-d array. E.g. x = np.arange(5); x.flags.writeable = False; np.random.shuffle(x) succeeds. This might be a bug in the shuffle method. Commented Dec 17, 2021 at 16:50
  • 1
    @WarrenWeckesser, that's great, thanks! Do you want to post that as an answer, so that if anyone has this question in the future, they will see that they just need to make sure to have the latest version of NumPy? Commented Dec 21, 2021 at 14:46

1 Answer 1

1

This is a bug in numpy.random.shuffle in numpy versions 1.22 and earlier. The function does not respect the writeable flag of the input array when the array is one-dimensional.

numpy.random.Generator.shuffle has the same issue, and numpy.random.Generator.permuted fails to respect the writeable flag for arrays of any dimension.

This has been fixed in the main development branch of NumPy, so NumPy versions 1.23.0 and later will not have this bug. Note that NumPy 1.22.0 has not been released yet, but is available as a release candidate. The fix occurred after the branching of 1.22, so the fix will not be in 1.22.0.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.