6

I have an ndarray subclass which implements loading/saving of one or more records into a flat binary file. After the records are loaded, I can access them in the normal NumPy fashion.

My question is about what happens when I slice the result (or indeed, any NumPy array). This normally produces a 'view' ie. an array that refers to the same buffer as the parent array.

Once I have this view, is there any way to determine the position of the view V in the array A? More precisely, I would like to know the byte offset (from the start of A's data buffer) at which V begins. This would allow me to write the slice back onto disk at the right offset.

Here's some example code to show the situation:

# Imagine a as consisting of 4 4-byte records...
a = np.arange(16, dtype='B').reshape(4,4)

# I select the first record
v = a[0]

print (v)

# [0 1 2 3]

# I can determine that v is a subarray:

is_subarray = v.base != None

# I can determine which dimension the slice spans..

whichdim = v.base.strides.index (v.strides[-1])

# But not its position along that dimension.
4
  • Why don't you store the information you need (dim + index) along the view in a custom class ? Commented Sep 14, 2012 at 11:00
  • @NicolasBarbey Sure, I could do that... OTOH NumPy knows the location of that slice already. It seems silly to duplicate that information (isn't there some way of getting that info from NumPy?) Commented Sep 14, 2012 at 11:33
  • Aren't memmaps more useful to you anyways? Sure its possible to get it... But nicely, not sure. Commented Sep 14, 2012 at 11:49
  • @Sebastian Memmaps are good, but they currently have a few problems that makes me want to stay away from them. The main one is that array subclass can't be guaranteed to be preserved, because a[0]['x'] and a['x'][0] do not both return a conventional array (one of them returns a 'numpy.void', the end result is inconsistent behaviour WRT returning scalar values). This has been the source of much frustration. I want to subclass something that behaves itself (like ndarray) Commented Sep 14, 2012 at 11:53

1 Answer 1

6

The informaiton is exposed through array.__array_interface__ (maybe somewhere better too), however I think you should probably just use memmaps to begin with and not mess around with this. Check for example the numpy code to the np.may_share_memory function (or actually np.byte_bounds).

Sign up to request clarification or add additional context in comments.

7 Comments

Thank you!!!! Especially for giving such a detailed answer... Believe me, I wish I could continue to use memmaps. I've given them quite a long try (years) cause I thought they were the thing to use here, and like I said.. they're nice when they work, and bewildering when they don't. I'll give this a go instead.
Actually, np.byte_bounds is ideal . np.byte_bounds(V)[0] - np.byte_bounds(V.base)[0] gives the byte offset of V into A, which can be easily converted into record-based offset by inspecting itemsize and shape.
A small warning though. In different numpy versions ndarray.base may point to different arrays if you make a view based on a view. IE. you may have to do .base multiple times to get to the original I believe (I think this may change in the next versions to always point directly to the original).
yes, that seems to be the case in 1.6.2, thanks for the heads-up :)
@kampu what was your problem with memmaps? They propagate oddly (x+x gives a new memmap but it actually isn't one), but I think there is a simple fix for that if thats one of your issues?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.