3

I'm writing a shell script, and I have created an array containing several strings:

array=('string1' 'string2' ... 'stringN')

Now, I have a string saved in a variable, say a:

a='stringM'

And this string is part of the array. My question is: how do I find the position of the string in the array, without having to check the terms one by one with a for loop?

Thanks in advance

4 Answers 4

2

The basic question is: why do you want to avoid a for loop?

  • Syntactical convenience and expressiveness: you want a more elegant way to conduct your search.

  • Performance: you're looking for the fastest way to conduct your search.

tl;dr

For performance reasons, prefer external-utility solutions to pure shell approaches; fortunately, external-utility solutions are often also the more expressive solutions:

  • For large element counts, they will be much faster.
  • While they will be slower for small element counts, the absolute time spent executing will still be low overall.

The following snippet shows you how these two goals intersect (note that both commands return the 1-based index of the item found; assumes that the array elements have no embedded newlines):

# Sample input array - adjust the number to experiment
array=( {1..300} )

# Look for the next-to-last item
itmToFind=${array[@]: -1}

# Bash `for` loop
i=1
time for a in "${array[@]}"; do
    [[ $a == "$itmToFind" ]] && { echo "$i"; break; }
    (( ++i ))
done

# Alternative approach: use external utility `grep`
IFS=$'\n' # make sure that "${array[*]}" expands to \n-separated elements
time grep -m1 -Fxn "$itmToFind" <<<"${array[*]}" | cut -d: -f1

grep's -m1 option means that at most one match is searched for; -Fnx means that the search term should be treated as a literal (-F), match exactly (the full line, -x), and prefix each match with its line number (-n).

With the array size given - 300 on my machine - the above commands perform about the same:

300

real    0m0.005s
user    0m0.004s
sys 0m0.000s

300

real    0m0.004s
user    0m0.002s
sys 0m0.002s

The specific threshold will vary, but:

  • Generally speaking, the higher the element count, the faster a solution based on an external utility such as grep will be.

  • For low element counts, the absolute time spent will probably not matter much, even if the external utility solution is comparatively slower.

To show one end of the extreme, here are the timings for a 1,000,000-element array (1 million elements):

1000000

real    0m13.861s
user    0m13.180s
sys 0m0.357s

1000000

real    0m1.520s
user    0m1.411s
sys 0m0.005s
Sign up to request clarification or add additional context in comments.

Comments

1

without any other information on array there is no other solution than check each element, if data is sorted a search by dichotomy can be done. otherwise another structure can be used like a hash.

for example instead of elements appending to array since bash 4.

declare -A hash
i=0;
for str in string{A..Z}; do
    hash[$str]=$((i++))
done

echo "${hash['stringI']}"

1 Comment

Too bad... Thank you for answering so quickly anyway
0

Not sure if this will work for you or if this is the best way to do it avoiding a for loop, but you can try:

$ array=('string1' 'string2' 'string3' 'string4')
$ a='string3'
$ printf "%s\n" "${array[@]}" | grep -m1 -Fxn "$a" | cut -d: -f1
3
$ i=$(( $(printf "%s\n" "${array[@]}" | grep -m1 -Fxn "$a" | cut -d: -f1) - 1 ))
$ echo $i
2

Breaking it down:

printf "%s\n" "${array[@]}"

prints every element of the array separated by a new line, then we pipe it to grep to get the matching line number for the $a variable and use cut to get only the line number wihtout the match:

printf "%s\n" "${array[@]}" | grep -m1 -Fxn "$a" | cut -d: -f1

Finally, substract 1 from the matching line number returned using arithmetic expansion and store it in $i:

i=$(( $(printf "%s\n" "${array[@]}" | grep -m1 -Fxn "$a" | cut -d: -f1) - 1 ))

5 Comments

@KCJV Glad it works for you, I edited my answer to add an explanation as it might be useful.
That's just the same as looping by hand, except that it's sed that does the looping over it's input. Possibly slower because of the fork+exec, at least for small arrays
@ilkkachu Well, the question said "without having to check the terms one by one with a for loop". My answer is not using a for loop. As far as speed, I thing is going to be much faster for large arrays than using a for loop, for small ones the speed difference might be neglectable, but those are just speculations.
++for a promising approach overall, but sed is the wrong tool for the job: even if you anchored the search term with sed -n "/^$a\$/=", you'd still run into problems if $a happened to contain regex metacharacters. You really want literal, full-line matching in this case.
@mklement0 Thanks for the feedback, you are right. Updated my answer to use the grep + cut aproach.
0

As others have shown way based on current array, may I suggest you could also turn the array into an associative one and have your strings as the indexes pointing to numbers.

declare -A array=(['string1']=1
                  ['string2']=2
                  ...
                  ['stringN']=N )

a='stringM'

echo ${array[$a]}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.