Find string between char in unix

Question

I have a basic query. I have a string like below:

on one off abcd on two off

I want to find out all the string between 'on' and 'off' the result I am expecting here is 'one' and 'two'

I believe this is possible with sed..

I tried with sed 's/on\(.*\)off/\1/g' but this returns one off abcd on two

Wintermute · Accepted Answer · 2015-03-20 10:52:44Z

2

With sed, I think the easiest way is to use two sed processes:

echo 'on one off abcd on two off' | sed 's/\<on\>[[:space:]]*/\non\n/g; s/[[:space:]]*\<off\>/\noff\n/g' | sed -n '/^on$/,/^off$/ { //!p; }'
one
two

This falls into two parts:

sed 's/\<on\>[[:space:]]*/\non\n/g; s/[[:space:]]*\<off\>/\noff\n/g'

puts the on and off on easily recognizable, single lines, and

sed -n '/^on$/,/^off$/ { //!p; }'

prints just the stuff between them.

Alternatively, you could do it with Perl (which supports non-greedy matching and lookarounds):

$ echo 'on one off abcd on two off' | perl -pe 's/.*?\bon\b\s*(.*?)\s*\boff\b.*?((?=\bon\b)|$)/\1\n/g; s/\n$//'
one
two

Where the

s/.*?\bon\b\s*(.*?)\s*\boff\b.*?((?=\bon\b)|$)/\1\n/g

puts everything between \bon\b and \boff\b (where \b matches word boundaries) on a single line. The main trick is that .*? matches non-greedily, which is to say it matches the shortest string necessary to find a match for the full regex. The (?=\bon\b) is a zero-length lookahead term, so that the .*? matches only before another on delimiter or the end of the line (this is to discard data between off and on).

The

s/\n$//

just removes the last newline that we don't need or want.

edited Mar 20, 2015 at 10:52

answered Mar 20, 2015 at 10:33

Wintermute

44.3k5 gold badges85 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Bhabani Shankar Over a year ago

Thanks for the anwser... but this is not printing anything at all.. can you please revisit

Wintermute Over a year ago

Do you, per chance, use Mac OS X?

Bhabani Shankar Over a year ago

The perl option seems to be working... now to the actual scenario where I would be doing testing <replayqueue>abdc</replayqueue>ccc<replayqueue>lmn</replayqueue>dddd<replayqueue>xyz</replayqueue> I would like to get abcd, ccc, xyz....

Wintermute Over a year ago

Honestly, if you wanted to extract data from XML, why didn't you ask a question about extracting data from XML? There are tools that make this much easier and more reliable. For example, with xmlstarlet you could just write xmlstarlet sel -t -v '//replayqueue/node()' -n.

Jotne · Accepted Answer · 2015-03-20 11:15:12Z

0

Here is an awk version

awk -v RS=" " '/\<off\>/ {f=0} f; /\<on\>/ {f=1}' file
one
two

edited Mar 20, 2015 at 11:15

answered Mar 20, 2015 at 11:08

Jotne

41.7k13 gold badges54 silver badges58 bronze badges

Comments

NeronLeVelu · Accepted Answer · 2015-03-20 12:07:19Z

0

sed 's/\(.*\) off.*/ \1³/;s/ off /³/g;s/ on /²/g;s/³[^²]*²/³²/g;s/^[^²]*²/²/;s/²/\
/g;s/.//;s/³//g'

use ²and ³ as delimiter (because POSIX sed does not allow a group rejection but a class) instead of onand off. Other character not used in the string could be use (avoid maybe meta char like &, ...)
other action is to separate external content (remove) and reformat

answered Mar 20, 2015 at 12:07

NeronLeVelu

10.1k1 gold badge26 silver badges44 bronze badges

Collectives™ on Stack Overflow

Find string between char in unix

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related