Pointer to empty array

Hello all,

I’ve been recently dealing with pointers a lot and have come up with a potential issue/question for the community about the best approach here.

Assume we have a function returning a pointer (view only - it NEVER allocates stuff) to data in a derived type:

type :: my_array
   integer :: n
   integer :: short(16)
   integer, allocatable :: long(:)
end type my_array

! Return a view of my array for the upstream scope
function view(a) result(data)
    type(my_array), intent(inout), target :: a
    integer, pointer :: data(:)
    if (a%n>16) then 
       data => a%long(1:n)
    elseif (array%n>0) then 
       data => a%short(1:n)
    else
       nullify(data)
    endif
end function view

Now the problem with this approach is that returning a .not.associated pointer is always a pain, because we always have to check for its association status. On the other hand:

allocate(data(0))

would certainly cause memory leaks. So I believe the best approach is to have a global target variable representing an empty array, and pointing to it

function view(array) result(data)
    type(my_array), intent(inout), target :: array
    integer, pointer :: data(:)
    integer, target, save :: NO_DATA(0) ! static empty array
    if (array%n>16) then 
       data => array%long(1:n)
    elseif (array%n>0) then 
       data => array%short(1:n)
    else
       data => NO_DATA
    endif
end function view

so that the pointer is always associated and can be used safely in functions like

print *, view(array)

Now I would like to discuss with you if this is a viable/appropriate approach, or if there are better alternatives out there that I haven’t considered yet. Thank you!

I won’t directly answer as I don’t know if this code is legal or not (I think it is, but I’m not sure). However:

  • allocate( data(0) ) is not not supposed to effectively allocate some memory (I mean, the compiler doesn’t have to), so there should not be a memory leak
  • short being a static component that always exist, I would simply write:
elseif (array%n >= 0) then 
       data => array%short(1:n)

In case n == 0, data will simply point to an empty slice (1:0).

1 Like

The compiler must allocate the pointer metadata even if the array itself has zero size. The metadata includes e.g. rank, stride, and bounds information. In the case of a pointer function result, I’m unsure if that is done for every function reference or if it is just done once and then reused. If it is done anew for every function reference, and if that pointer is not later deallocated, then that dangling metadata would be a memory leak.

1 Like

As soon as a Fortran pointer is declared, a descriptor (metadata) is created, whether the pointer is allocated or not. The descriptor contains a C pointer, which allocates the memory for the content.

My assumption was that the compiler doesn’t have to allocate any memory via the C pointer if the Fortran pointer is zero sized. But actually, “doesn’t have to” is not the same as “doesn’t”…

1 Like

Thank you @PierU @RonShepard - interesting points. I would avoid using the allocate statement with pointer anyways, just as a general coding rule to never use pointers with the allocate statement; however, both approaches seem to work well with all major compilers:

program point_to_0
  integer :: n
  integer, target :: data(10) = [(n,n=1,10)]
  integer, pointer :: ptr(:)  

  n = 2
  ptr => data(1:n)
  print *, associated(ptr), size(ptr)

  n = 0
  ptr => data(1:n)
  print *, associated(ptr), size(ptr)

  n = -1
  ptr => data(1:n)
  print *, associated(ptr), size(ptr)

  allocate(ptr(0))
  allocate(ptr(0))
  allocate(ptr(0))
  allocate(ptr(0))
  allocate(ptr(0))
  print *, associated(ptr), size(ptr)

end  

I was thinking of the situations where the pointer function result was used in an expression, or maybe in a statement like

call sub( f(a1), f(a2), f(a3) )

It seems like there must be multiple descriptors allocated simultaneously in these cases, and if the allocated arrays have nonzero size, then this would result in memory leaks. The question then is whether allocating zero-size arrays would also result in memory leaks due to the metadata itself.

As the programmers we have absolutely no control on how the descriptors (metadata) are managed, this is the task of the compiler only. If there’s a memory leak because of the descriptors, this is a flaw/bug in the compiler.

As far as I remember, AIX does not allow allocate zero-length arrays, so allocate(ptr(0)) may be not the best solution for really portable code.

1 Like

If AIX does not allow zero-length arrays to be allocated, then the compiler is not fully compatible with the standard :slight_smile: .

But in any case, this concerns allocatables that are being allocated and then, modulo compiler errors, you cannot have memory leaks, only memory that is not very useful but should disappear when the variable goes out of scope. And perhaps that is of concern here: you might get “dangling” pointers this way. The solution where you point to small(1:0) or a saved array of size zero is likely to be better.

2 Likes

Can you show something in the standard that supports this?

I would think normally that if memory is allocated through a pointer function, and it is never deallocated subsequently, then it could result in a memory leak, even if those arrays are zero size, because of the associated metadata.

In the example above with the subroutine call, maybe the metadata for the three anonymous pointers is destroyed after the call statement? Or perhaps a subsequent deallocate() is required somewhere within sub() to avoid the memory leak? It would be good if some relevant standard text could be found to answer this one way or the other.

How the descriptors are managed, and even the existence of the descriptors, is beyond the scope of the standard. These are implementation detail, and the standard does not specify implementation details.

I am expecting any decent compiler to release the descriptor once the the Fortran pointer goes out of scope, regardless its association status. If it doesn’t, then it should be reported as a quality of implementation issue.

As written above, I am indeed expecting any decent compiler to do so.

A deallocate() doesn’t do anything on the descriptor itself.

I guess that - as @PierU suggests - zero-sized allocation is really just equivalent to zeroing out the pointer descriptor’s fields, as this does not crash:

allocate(ptr(0))
allocate(ptr(0))
allocate(ptr(0))
...

(btw - is it Standard conforming to be allowed to allocate the same variable multiple times?)

the pointer metadata uses between ~48 and 72 bytes depending on the compiler, it’s likely always going to be a stack variable imho.

9.7.1.4 Allocation of pointer targets says It is not an error to allocate a pointer that is already associated with a target. It is an error to allocate an already allocated allocatable.

Indeed, repeated ALLOCATE on a pointer like above is the shortest way to create garbage (real Fortran garbage, not just memory leaks in the runtime support system).

In the case of a regular pointer, the question is whether the sequence

allocate(ptr(0))
allocate(ptr(0))
allocate(ptr(0))

is different from the sequence

allocate(ptr(0))
deallocate(ptr)
allocate(ptr(0))
deallocate(ptr)
allocate(ptr(0))

Specifically can the first code result in memory leaks? I think a single pointer descriptor is reused in this case, so it is not a question of generating new descriptors, but rather a question of what else happens when allocate() is executed and what happens when a zero-sized array is deallocated. I’m fairly certain that the second code is not supposed to leak memory, regardless of the underlying memory management that occurs.

In the case of a function pointer that allocates memory, there is also the question about the pointer descriptors. It is possible that multiple descriptors are generated, and if they do not evaporate after they are generated (i.e. after the expression in which they are used is executed, or after the call statement in which they are used completes), then do they also represent a memory leak?

As @PierU stated, this low-level stuff is probably outside the scope of the standard, but there may be some of it covered by the standard, such as whether a deallocate(ptr) is always supposed to occur after an allocate(ptr).

Yes, this is a normal thing. There can be multiple pointers associated with that same memory. However, that anonymous memory can only be accessed, or deallocated, if another pointer was assigned to it. If no other pointers point to that memory, then it cannot be accessed by the program. That is a memory leak.

It is allowed to allocate a pointer that is already allocated. 9.7.1.4 (Allocation of pointer targets) says, in part:

It is not an error to allocate a pointer that is already associated with a target. In this case, a new pointer target is created as required by the attributes of the pointer and any array bounds, type, and type parameters specified by the ALLOCATE statement. The pointer is then associated with this new target. Any previous association of the pointer with a target is broken. If the previous target had been created by allocation, it becomes inaccessible unless other pointers are associated with it.

Therefore, a series of ALLOCATEstatements on the same pointer is allowed per the standard.

Additionally, a zero-sized target is valid and is not the same as a disassociated pointer.

One question we have in this case is perhaps outside of the standard, perhaps not. Is it a memory leak to allocate zero-size arrays using the same pointer without corresponding deallocate() statements for each of the allocations? Also, is the answer to that question different for regular declared pointers and for pointer functions (where the zero-size array allocation occurs within the funtion)?

1 Like

To me this clearly outside of the standard, and you can imagine implementations that have or don’t have a memory leak in this case. For instance the (overly simplified) descriptor of a 1D pointer would be in C:

typedef struct {
   size_t n;
   void* ptr;
} fpointer1d;

fpointer1d* x;

allocate( x(n) ) on the Fortran side, for n>0 would be on the C side x->ptr = malloc( x->n).

Now, calling malloc() with a zero size is allowed, but the result is implementation defined (processor dependent as we say in Fortran). Assigning x.ptr = NULL is a solution, but this raises a problem: the Fortran compiler has to be able to differentiate 2 different zero-size Fortran pointers:

integer, pointer :: p(:), q(:)
allocate( p(0), q(0) ) 
print*, associated( p(0), q(0) )   ! Has to be F(alse)

And this is not possible if the two C pointers are NULL. Two solutions then:

  • the compiler allocates at least 1 byte anyway:
x->ptr = malloc( x->n > 0 ? x->n : 1 )

Then associated() can compare the C pointers to determine if the Fortran pointers are associated or not. But there is an actual memory allocation, and thus a memory leak if the pointer is not explicitly deallocated at some point.

  • The compiler uses a self-reference on the parent object:
x->ptr = ( x->n > 0 ? malloc(x->n) : (void*)x )

This ensures the C pointers are different and that associated() can make a meaningful test. And there’s no possible memory leak.

2 Likes

I agree with @PierU here.

2 Likes

Going back to the original example:

function view(array) result(data)
    type(my_array), intent(inout), target :: array
    integer, pointer :: data(:)
    if (array%n>16) then 
       data => array%long(1:n)
    elseif (array%n>=0) then 
       data => array%short(1:n)
    endif
end function view

is the intent(inout) enough a guarantee that the association status of data limited to the scope immediately upstream of the call to view is retained?
In other words, if the actual argument corresponding to array has neither target nor pointer attributes in the upstream scope, is there a chance that view returns an invalid pointer?

I believe it should be safe for the allocatable %long variable, but am not sure about the %short array, that may fall outside of the “pointer retention” rules in case the upstream variable is not target/pointer.

This code is not legal if the actual argument doesn’t have the pointer or target attribute.