49

If I have the text in a shell variable, say $a:

a="The cat sat on the mat"

How can I search for "cat" and return 4 using a Linux shell script, or -1 if not found?

2
  • possible duplicate of String contains in bash Commented Feb 17, 2011 at 16:39
  • 7
    @Daniel This question asks for an index of substring too Commented Feb 17, 2011 at 16:41

8 Answers 8

84

With bash

a="The cat sat on the mat"
b=cat
strindex() { 
  x="${1%%"$2"*}"
  [[ "$x" = "$1" ]] && echo -1 || echo "${#x}"
}
strindex "$a" "$b"   # prints 4
strindex "$a" foo    # prints -1
strindex "$a" "ca*"  # prints -1
Sign up to request clarification or add additional context in comments.

6 Comments

+1 It also works in Dash, ash, ksh, pdksh, zsh. Dash and ash want [ "$x" = "$1" ] and pdksh wants x=$2; x="${1%%$x*}", however.
@Zubair, bash 2.0 is 10 years old and 2 major releases behind (ftp.gnu.org/gnu/bash). Can you update it?
This is just brilliant. The first parameter substitution expression says "delete from the search expression on to the end", and the ${#x} is the length of what remains - which is the position of the search expression!
great answer, been using it for a while. just found out that a * in your search string will be interpreted as a wild card unless it is manually escape. i added a copy of your answer with automatic escaping, but all credit to you. stackoverflow.com/a/69960043/912236
@Orwellophile, to avoid pathname expansion of $var in "${param%$var}", the easiest is to quote it: "${param%"$var"}". See here.
|
40

You can use grep to get the byte-offset of the matching part of a string:

echo $str | grep -b -o str

As per your example:

[user@host ~]$ echo "The cat sat on the mat" | grep -b -o cat
4:cat

you can pipe that to awk if you just want the first part

echo $str | grep -b -o str | awk 'BEGIN {FS=":"}{print $1}'

7 Comments

cut -d: -f1 is a bit more lightweight than piping through awk
@Zubair: define "doesn't work"- the output is correct on my machine.
outputs '0:cat' on my mac
Gives '0:cat' on Mac and Ubuntu.
colrm 2 could also replace the awk portion
|
10

I used awk for this

a="The cat sat on the mat"
test="cat"
awk -v a="$a" -v b="$test" 'BEGIN{print index(a,b)}'

1 Comment

awk gives +1 too great answer considering what the original poster requested. Howerer, there is a way to correct it: awk -v a="$a" -v b="$test" 'BEGIN{print index(a,b)}' | xargs expr -1 +
5
echo $a | grep -bo cat | sed 's/:.*$//'

1 Comment

@Zubair - your command displays "4" on my Ubuntu 10.04 box. That's what I expect.
2

This is just a version of the glenn jackman's answer with escaping, the complimentary reverse function strrpos and python-style startswith and endswith function based on the same principle.

Edit: updating escaping per @bruno's excellent suggestion.

strpos() { 
  haystack=$1
  needle=$2
  x="${haystack%%"$needle"*}"
  [[ "$x" = "$haystack" ]] && { echo -1; return 1; } || echo "${#x}"
}

strrpos() { 
  haystack=$1
  needle=$2
  x="${haystack%"$needle"*}"
  [[ "$x" = "$haystack" ]] && { echo -1; return 1 ;} || echo "${#x}"
}

startswith() { 
  haystack=$1
  needle=$2
  x="${haystack#"$needle"}"
  [[ "$x" = "$haystack" ]] && return 1 || return 0
}

endswith() { 
  haystack=$1
  needle=$2
  x="${haystack%"$needle"}"
  [[ "$x" = "$haystack" ]] && return 1 || return 0
}

4 Comments

pathname expansion is much more than * (? and [..]). The best way to prevent pathname expansion is to quote $2 in x=${haystack%%"$2"*}
@Bruno "Today I Learned...."
@Orwellophile For strpos(), if the value of $needle is the empty string '' (null) or some other un-matchable pattern, then the result stored in $x would be the value of $haystack itself. There can be no match, and so therefore nothing is deleted from an expanded $haystack. The variable expands normally by bash rules, and the value is stored in $x. Prevent empty arguments upon execution with the someVariable="${1:?}" format. It makes your functions MUCH more type safe, so to speak. gnu.org/software/bash/manual/bash.html#Shell-Expansions
While having default values is normally a fantasic thing, I can't see how that would help in this case. If $needle is empty, then strpos will echo position 0. If $needle is not found, it will echo -1. Errorlevels will be set to 0 and 1 respectively. That is entirely correct for strpos, conforms to JavaScript's behaviour, and seems logical to me. It is also an essential component of gist.github.com/sfinktah/a432630706393d7bbe51f01508805cc6 (where I used these functions). strrpos should return the length of the string if $needle is '', but defaults won't help there.
1

This can be accomplished using ripgrep (aka rg).

❯ a="The cat sat on the mat"
❯ echo $a | rg --no-config --column 'cat'
1:5:The cat sat on the mat
❯ echo $a | rg --no-config --column 'cat' | cut -d: -f2
5

If you wanted to make it a function you can do:

function strindex() {
    local str=$1
    local substr=$2
    echo -n $str | rg --no-config --column $substr | cut -d: -f2
}

...and use it as such: strindex <STRING> <SUBSTRING>

strindex "The cat sat on the mat" "cat"
5

You can install ripgrep on MacOS with: brew install --formula ripgrep.

Comments

1

A variation (bash) on @Orwellophile 's answer, done out the long way. However, it also does the comparison with string lengths instead of comparing strings. You never know how long a string might be! :-) Hopefully, while clearly longer, this answer will be clearer.

function strpos ()
{
    local -r needle="${1:?}"    ## Prevents empty strings
    local -r haystack="${2:?}"  ## Prevents empty strings

    ## From a copy, attempts to remove characters from the end of a string, greedily.
    local -r remainingHaystack="${haystack%%"$needle"*}"
    local -ir remainingHaystackLength="${#remainingHaystack}"

    ## When the needle is not found in haystack, these values will be equal.
    if (( $remainingHaystackLength == ${#haystack} )); then
        echo -n -1
        return 1
    fi
    
    echo -n $remainingHaystackLength
}

If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the ‘%’ case) or the longest matching pattern (the ‘%%’ case) deleted.

Example:

If parameter = "/usr/bin/foo/bin", and word = "/bin"

 ${parameter%word}    ## /usr/bin/foo/bin  --> /usr/bin/foo (non-greedy)
 ${parameter%%word}   ## /usr/bin/foo/bin  --> /usr         (greedy)

If parameter is ‘@’ or ‘*’, the pattern removal operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with ‘@’ or ‘*’, the pattern removal operation is applied to each member of the array in turn, and the expansion is the resultant list.

Bash Reference Manual: Shell Expansion

Comments

-1

Most simple is - expr index "The cat sat on the mat" cat

it will return 5

1 Comment

This solution does not meet the requirement of returning -1 when the text value is not found. expr returns one (1) and prints zero (0) when a CHAR is not found, and so may cause ambiguity, depending on usage. Also if the string='Cat in the Hat Strikes Back' then expr index "$string" 'Hat' will print` 2, because the form of the command is index STRING CHARS` and not index STRING String. In this case it returns the position of the character a, because it is the second character in $string, meaning that expr uses one (1) based indexing, not zero (0) based indexing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.