0

I am writing a CSH script and attempting to extract text from a source string given a key.

!/bin/csh -f
set source = "Smurfs\n\tPapa\nStar Trek\n\tRenegades\n\tStar Wars\n\tThe Empire Strikes Back\n"
set toFind = "Star Trek"
set regex = "$toFind[\s]*?(.*?)[\s]*?"
set match = `expr $source : $regex`
echo $match

The above code does not work, so I am missing something. I tried placing "Star Trek" directory inside rather than a variable. I should see Regenages as the answer. Had I put "Star Wars" as instead of "Star Trek", I should have seen The Empire Strikes Back.

Google search showed a possible solution using grep, such as

match = `grep -Po '<something>' <<< $source

I did not know what to put for <something>, nor am I an expert in grep.

In the real code, I am reading text from a file. I just simplified things here.

Thoughts?

6
  • grep is for matching, sed is able to edit the stream, this is a good introduction: grymoire.com/Unix/Sed.html - also has examples on how to combine with shell scripts including csh. Commented Nov 13, 2023 at 21:28
  • 1
    @mandy8055 Your bash script returns "Star Trek" and not "Renegades", so directly as written no. That being said, I am open to a bash solution, though would still leave my original question up, as I am curious if a solution is possible in csh. Commented Nov 13, 2023 at 22:38
  • "Thoughts?" ... Your reg-ex looks very much like perl reg-ex (but I have no experience with that) . SO, if that is a perl-reg ex, you can be sure that unless you have a version of expr that supports perl-regex, that will never work. BUT now I am reading your initial problem descrip, "attempting to extract text from a source string given a key.". ?? key/values. Why are you using such an unhelpful solution? why not key[str]="value" or even just myKey=Renegades ? Ah, " I am reading text from a file." it might have helped to have that near the top of your Q. ..... Commented Nov 14, 2023 at 17:51
  • Following on, as you say ". I just simplified things here." I would rather spend my csh time on converting 2 lines of input into variable assignments, but it seems you have to deal with spaces in your var-names, so nix to Star Trek="Renegade" )-; . Doing quick research, I don't see that csh can do arr[key]="value" arrays, only set arr = (one two three), which are then referenced as echo $arr[1] $arr[3] etc. If you're processing a file with an extenal utility, the sed is good, but awk will give you much more understandable code. Busy now, so that's all I can come up with now. Commented Nov 14, 2023 at 18:08
  • Back the the perl-regex thing, There is a small set of perl-regexps special syntax that can be rewritten in long-hand basic regexp. I have to believe that the expr utility only uses basic regexs, but it's not documented in GNU coreutils 8.30 version of man expr. ( maybe in info '(coreutils) expr invocation'? ). You do know that using csh is shell scripting w one hand timed behind your back? OK as a learning challenge, but jobs/work, you'll do much better getting good at bash or zsh or something even newer (fish?) (man grep search for ERE is the best I can find). Good luck. Commented Nov 14, 2023 at 18:16

4 Answers 4

0

The following is not a literal answer to my question, as I asked the question for csh, however I wrote a solution using bash.

Match Regex Capture Groups

Match Whitespace How can I match spaces with a regexp in Bash?

I used Tutorial Point to debug.

mystring1='  asdf1@wxyz2  @@a!s#d@f@@  asdf2@wxyz2 b!t#e@g '

tofind='asdf1@wxyz2'
regex="${tofind}[[:space:]]*([.!@\#a-zA-Z0-9]+)"

[[ $mystring1 =~ $regex ]]

echo $'\n'
echo $'\n'
echo '***********************'
echo ${BASH_REMATCH[1]}
echo '***********************'
Sign up to request clarification or add additional context in comments.

1 Comment

mystring1=' asdf1@wxyz2 @@a!s#d@f@@ asdf2@wxyz2 b!t#e@g ' is not in the same newlines+tabs separated format as the text in your question, set source = "Smurfs\n\tPapa\nStar Trek\n\tRenegades\n\tStar Wars\n\tThe Empire Strikes Back\n". This is a possible answer to a different question than the one you asked.
0

The real solution uses a file for the source, so is:

set valueCapture=`cat /mypath/filename | grep -A1 "${tofind}" | grep -v "${tofind}" | xargs`

The code to find a capture value from a string should be (did not test it):

set valueCapture=`cat $source | grep -A1 "${tofind}" | grep -v "${tofind}" | xargs`

In both cases, the what I wish to find is:

set tofind='asdf1@wxyz2'

The xargs part trims off whitespace.

2 Comments

That's doing partial regexp matching across whole lines when you almostcertainly should be doing whole-line or whole-field string matching, and it'd fail if the same target string appeared in both lines.
Also, UUoC. Drop the cat file and just grep -A1 "${tofind}" file. From a string you might use echo "$source", but not cat`.
0

Since you said your real input is in a file, here's the file your printf outputs:

$ cat file
Smurfs
        Papa
Star Trek
        Renegades
        Star Wars
        The Empire Strikes Back

and here's how to match and print the strings you want from it:

$ awk -v tgt='Star Trek' '{gsub(/^[[:space:]]+|[[:space:]]+$/,"")} $0==tgt{n=NR+1} NR==n' file
Renegades

$ awk -v tgt='Star Wars' '{gsub(/^[[:space:]]+|[[:space:]]+$/,"")} $0==tgt{n=NR+1} NR==n' file
The Empire Strikes Back

See why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

Comments

0

A pipeline can do it, though it isn't as good as Ed's single process awk.

$: toFind="Star Wars"; echo "$source" |  grep -EA1 "$toFind" | tail -1
        The Empire Strikes Back

$: toFind="Star Trek"; echo "$source" |  grep -EA1 "$toFind" | tail -1
        Renegades

$: echo "$source">file; toFind="Star Trek"; grep -EA1 "$toFind" file | tail -1
        Renegades

A sed would work.

$: toFind="Star Trek"; sed -n "/$toFind/{n
                                         p}" file # should work with any version
        Renegades

$: toFind="Star Wars"; sed -n "/$toFind/{n;p}" file # semicolon is GNU
        The Empire Strikes Back

All of these are probably worth refining your regex.

$: toFind="Star"; sed -n "/$toFind/{n;p}" file
        Renegades
        The Empire Strikes Back

$: toFind="Star"; sed -n "/^$toFind$/{n;p}" file

$: toFind="Star Trek"; sed -n "/^$toFind$/{n;p}" file
        Renegades

$: toFind="Star Wars"; sed -n "/^$toFind$/{n;p}" file # fails because of the leading tab

That last one might mean you have to allow the first one.
Test your logic.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.