2

I am trying to return a subset of a multi-dimensional array, trying to keep the exact structure of dimensions, but.. something strange is happening... take a look please:

space = [  [ [1],[2],[3] ],  [ [4],[5],[6] ],  [ [70],[8],[9] ]  ]

space_subset = space[(1..2)].collect { |y| y[1] }

=> [[5], [8]] 

Let's break it down:

space[(1..2)]

=> [  [ [4], [5], [6] ], [ [70], [8], [9] ]  ]

so now I can be sure what I am calling .collect on

in fact:

[  [ [4], [5], [6] ], [ [70], [8], [9] ]  ].collect { |y| y[1] }

=> [[5], [8]]

Then... (for the real question)...

If now space_subset is [[5], [8]]

and I try to modify it like this:

space_subset[1].delete (8)

and as expected I get: => [[5], []]

why does this at the same time modifies the original "space" array from which I extracted the subset array ?

If now I do:

space

=> [[[1], [2], [3]], [[4], [5], [6]], [[70], [], [9]]]

"8" is missing, the same value I deleted from the space_subset

I am looking at ruby Array api docs and from what I am reading my code should work without surprises... but.. still.....

Can you help me figure what I'm doing wrong, or misunderstanding here ?

Thanks to everyone who takes the time to answer

8
  • btw, 70 is in place of 7 for no particual reason, I apologize if it creates confusion. Commented Nov 14, 2012 at 1:52
  • space_subset contains the reference to both [5] and [8] arrays, not their values. So if you alter it in space_subset, it will affect space as well. Commented Nov 14, 2012 at 1:55
  • wait, isn't space_subset a new object which is equal to what I am assigning it to ? So why it should contain a reference and not it's own values ? Bringing it to the simplest: a = 10; b = a; b = 20; a => 10. "a" is still 10, because b is another object with its own value, not a reference to "a". So I don't see why you say it should work differently in my question example ? Commented Nov 14, 2012 at 2:02
  • Array does not work the same way than Fixnum. Commented Nov 14, 2012 at 2:05
  • Can you elaborate or provide documentation for your statements ? Commented Nov 14, 2012 at 2:07

2 Answers 2

4

Remember that in Ruby that not only is everything an object but that a variable is always a reference to an object. You're expecting a copy to be made here when what you're getting instead is a reference to the original single-element array.

This is why there are clone or dup methods on many objects. If you intend to modify something before using it, but do not want to mangle the original, make a copy and work with that.

An easy way to do this is to avoid using in-place modifiers like delete and instead use one like reject:

space_subset[1] = space_subset[1].reject { |v| v == 8 }

This will remove a single element and return a copy of the original array minus that element. This isn't necessarily the best way to go about doing it, though. A better approach might be to simply "subtract" the elements you don't want as that also returns a copy:

space_subset[1] -= [ 8 ]

In general you must be wary of using in-place modifiers on data you don't "own". To be safe, you should use operations that produce a modified copy.

Sign up to request clarification or add additional context in comments.

Comments

3

This is a difference between reference and value. In your code, you create a reference to the inner arrays, but you are referencing the same values in both places. You can confirm this by calling Object#object_id on both arrays (as if changing the value via one reference and seeing be modified from the other reference isn't confirmation enough!).

space = [  [ [1],[2],[3] ],  [ [4],[5],[6] ],  [ [70],[8],[9] ]  ]
=> [[[1], [2], [3]], [[4], [5], [6]], [[70], [8], [9]]] 
space[2][1].object_id
=> 70329700053380 
space_subset = space[(1..2)].collect { |y| y[1] }
=> [[5], [8]] 
space_subset[1].object_id
=> 70329700053380

Unfortunately, Array#dup and Array#clone only make "shallow" copies of objects, so you have to use a bit of a workaround to get a copy of space to work with. One easy trick to get a deep copy is:

Marshal.load(Marshal.dump(space))

You can also write a recursive function to take space and manually copy it into a new array.

And just to prove it:

space = [  [ [1],[2],[3] ],  [ [4],[5],[6] ],  [ [70],[8],[9] ]  ]
=> [[[1], [2], [3]], [[4], [5], [6]], [[70], [8], [9]]]
space[2][1].object_id
=> 70329700053380
space_subset = Marshal.load(Marshal.dump(space))
=> [[[1], [2], [3]], [[4], [5], [6]], [[70], [8], [9]]]
space_subset = space_subset[(1..2)].collect { |y| y[1] }
=> [[5], [8]]
space_subset[1].object_id
=> 70329695297500
space_subset[1].delete(8)
=> 8
space
=> [[[1], [2], [3]], [[4], [5], [6]], [[70], [8], [9]]]
space_subset
=> [[5], []]

Hope that helps!

3 Comments

Ok thanks from the detailed answer but... from Array api doc, .collect method: Invokes block once for each element of self. Creates a new array containing the values returned by the block. So shouldn't space_subset yield a totally new copy of space ?
collect does create a new array, but the items contained within that array (the values returned by the block with y[1]) are references to the original [5] and [8] arrays. Another way to attack your problem is to call dup inside your collect block (since you are dealing with the lowest level of nesting, the shallow copy is no longer an issue): space_subset = space[(1..2)].collect { |y| y[1].dup } that will return a new array with a copy of the [5] and [8] values that will be independent of space. Clear as mud?
Yea, after reading this and many other answers to pass by value/reference questions now the mud it's starting to get cleaner :) By the way I haven't thanked you enough for your comprehensive answer! I really appreciated it! Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.