0

There's some interesting shenanigans going on with references:

#!/bin/bash

read -r -d '' payload << EOF
apple: vinegar
orange: juice
EOF

populate() {
    declare -n _aa=$1

    declare data="this is not even an array"

    while IFS=": " read -r key value; do
        _aa["$key"]="$value"
    done <<< "$payload"

    echo "~~~ populate about to finish, data now: ${_aa[*]@K}"
}

process() {
    declare -A data=( ["mango"]="lassi" )

    populate data
    echo "~~~ process about to terminate, data now: ${data[*]@K}"
}

process

Note that there is two variables named data - one within process function and another in populate - they have nothing to do with each other, they are different kinds and both local to each function. But the code as above does not work as intended:

~~~ populate about to finish, data now: 0 "juice"
~~~ process about to terminate, data now: mango "lassi" 

The variable reference inside populate apparently references data variable inside, not outside. Once left, the other variable is intact. All it takes is to use different name for one of the data variables:

~~~ populate about to finish, data now: orange "juice" apple "vinegar" mango "lassi" 
~~~ process about to terminate, data now: orange "juice" apple "vinegar" mango "lassi"

Now I am aware of "circular reference" issue when it's a problem to call the referenced variable the same as the one inside the function declaring reference. But how can one reasonably prevent the above? One has to predict each and every variable name despite they are local to each function in this case? Is there any explanation for this? Is declare -n just repackaged old-fashioned indirection?


NOTE I found this in the Bash reference:

Local variables "shadow" variables with the same name declared at previous scopes. For instance, a local variable declared in a function hides a global variable of the same name: references and assignments refer to the local variable, leaving the global variable unmodified. When the function returns, the global variable is once again visible.

The above experience gives it a whole new meaning - references refer to local variable no matter what. There is very little mentioned that I could find how references work when both sides were local.


I have since found Greg's wiki take on references:

if you have declare -n ref=$1 and try to use $ref, the shell will look for whatever variable of that name is visible by the current function. There is no way to override this. You cannot say "this nameref points to the variable foo in the caller's scope" unambiguously.

It's not exactly true either. Clearly, the reference works across 2 functions where each had that variable local. So it's not variable of a name "visible by the current function".

My concern is a reference may work till some point in the function and then it becomes reference to something else as it locally shadows it with no indication of it.

Then there's also the problem with bash's name references relating to the more well known bash: warning: v: circular name reference:

You can avoid the circularity by using declare only if the names do not clash

$ foo() { if [[ $1 != v ]]; then declare -n v=$1; fi; echo $v; }
$ bar() { if [[ $1 != v ]]; then declare -n v=$1; fi; foo v; }
$ v="xyz"
$ bar v
xyz

The above is fairly impractical, so to say, and cannot be applied to the case here - where the reference comes from another local variable and even worse, gets lost later within the function.


I could not find any comprehensive explanation of Bash scoping, it's just examples with warnings what will break. Is nameref actually any different than ${!indirection}? And how come the ref is even visible to the other function here, where it was local in the caller?

15
  • 1
    If you call your populate function with parameter data the nameref _aa refers to the variable named data in the caller's scope (the process function) only between the first and second declare statements. The second declare statement creates a local variable with the same name that hides the other one. So, after the second declare statement _aa refers to this new local variable. What other behavior do you expect? Commented Jun 11 at 6:28
  • 1
    "That makes referencing fairly useless": we tend to avoid opinionated statements here, so I will not comment on this. "So it's not variable of a name "visible by the current function" - and what you mention is even more bizzare - reference may work till some point in the function where it stops": that's not that bizarre. It's a matter of scope. The scope of data is one level up until you declare the local variable data. Starting from that the scope is local. Commented Jun 11 at 7:45
  • 2
    @jhnc If namerefs were pointers that would be a reasonable expectation. But as the name says a nameref is a reference by name. So, if what a name refers to changes, like when you declare a new variable with an existing name, so does the nameref. Commented Jun 11 at 8:03
  • 1
    @AlbertCamu "That makes referencing fairly useless" is opinionated. Others find namerefs useful because they allow to do things that are difficult or impossible to do without them. And as I answered to another comment the name nameref means reference by name, not pointer or whatever. It is a way to refer to an object by its name. If what a name means changes so does the nameref on that name. Maybe if you think of namerefs like this, things become easier to understand. Commented Jun 11 at 8:08
  • 2
    @AlbertCamu Why do you think that you "cannot access another variable local to the caller from callee ordinarily, it's not like subshell environments"? Try the following and see that you can: foo() { echo "a=$a"; }; bar() { declare a="a"; foo; }; bar. This prints a=a and demonstrates that variable a, which is local to function bar, is shared with function foo that bar calls. Commented Jun 11 at 15:08

1 Answer 1

0

For whoever struggles with similar issues with Bash name references, the good starting point is Greg's wiki:

Name reference variables are the preferred method for performing variable indirection. Older versions of Bash could also use a ! prefix operator in parameter expansions for variable indirection. Namerefs should be used unless portability to older bash versions is required. No other shell uses ${!variable} for indirection and there are problems relating to use of that syntax for this purpose. It is also less flexible.

With that, it becomes important to understand the dynamic scoping of Bash (thanks @jhnc) which is not particularly emphasized, but (qouting the same):

Indirection can only be achieved by indirectly evaluating variable names. IOW, you can never have a real unambiguous reference to an object in memory; the best you can do is use the name of a variable to try simulating the effect. Therefore, you must control the value of the ref and ensure side-effects such as globbing, user-input, and conflicting local parameters can't affect parameter names. Names must either be deterministic or validated in a way that makes certain guarantees. If an end user can populate the ref variable with arbitrary strings, the result can be unexpected code injection.

And importantly on main Bash page of Greg's wiki:

Name references are created with declare -n, and they are local variables with local names. Any reference to the variable by its local name triggers a search for a variable with the name of its content. This uses the same dynamic scope rules as normal variables. So, the obvious issues apply: the local name and the referenced name must be different. The referenced name should also not be a local variable of the function in which the nameref is being used.

The workaround for this is to make every local variable in the function (not just the nameref) have a name that the caller is unlikely to use.

Further, from Bash manual:

Local variables "shadow" variables with the same name declared at previous scopes. For instance, a local variable declared in a function hides a global variable of the same name: references and assignments refer to the local variable, leaving the global variable unmodified. When the function returns, the global variable is once again visible.

The shell uses dynamic scoping to control a variable’s visibility within functions. With dynamic scoping, visible variables and their values are a result of the sequence of function calls that caused execution to reach the current function. The value of a variable that a function sees depends on its value within its caller, if any, whether that caller is the "global" scope or another shell function. This is also the value that a local variable declaration "shadows", and the value that is restored when the function returns.

For example, if a variable var is declared as local in function func1, and func1 calls another function func2, references to var made from within func2 will resolve to the local variable var from func1, shadowing any global variable named var.

And to top this all off, I had quite a roller coaster ride with declare and its switches.

The issue is that it appears it's setting or unsetting variable attributes, but not all switches are equal in that sense:

Using ‘+’ instead of ‘-’ turns off the attribute instead, with the exceptions that ‘+a’ and ‘+A’ may not be used to destroy array variables and ‘+r’ will not remove the readonly attribute. When used in a function, declare makes each name local, as with the local command, unless the -g option is used. If a variable name is followed by =value, the value of the variable is set to value.

One option that stands out is -g that really is NOT an attribute and therefore has to be repeated every time when manipulating other attributes or otherwise one is operating on a different (locally scoped) variable.

Another less logical is -n which has, for me, confusing documentation:

-n Give each name the nameref attribute, making it a name reference to another variable. That other variable is defined by the value of name. All references, assignments, and attribute modifications to name, except for those using or changing the -n attribute itself, are performed on the variable referenced by name’s value. The nameref attribute cannot be applied to array variables.

Clearly, nameref attribute works with both arrays and associative arrays, however a variable that holds an array cannot be applied nameref attribute. But there's other issues, such as e.g. local reference and setting its attributes.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.