3

Example:

This (word1) is a test (word2) file.

What I want:

This is a test file.

The problem is that the brackets occur more than once, so if I use:

sed 's/<.*>//g'

I get This file which it's wrong.


How about if I want to replace the string between two same patterns?

Like:

WORD1 %WORD2% WORD3 => WORD1 WORD3
11
  • so you want to remove all text inside parentheses? Commented Dec 16, 2015 at 12:19
  • Exactly. But the parentheses are just a very simple example, it could also be more than one symbol like #/to be replaced/# or %to be replaced% Commented Dec 16, 2015 at 12:46
  • 1
    Please update the question providing more details. Commented Dec 16, 2015 at 12:51
  • @Lobby2: Again, why same patterns? Where are the identical parts? What do you expect as an output for WORD1 %WORD2% WORD3 something WORD1 %WORD2% WORD3? Commented Dec 16, 2015 at 12:56
  • The nominated duplicate specifically answers that case (too). Please review existing questions before posting here. Thanks. Commented Dec 16, 2015 at 12:56

1 Answer 1

4

All you need is a negated character class [^<>]* that will match any character but a < or >:

sed 's/<[^<>]*>//g'

Or, if you have round brackets you can use [^()]* (note that in BRE syntax, to match a literal ( or ) escaping \ is not necessary):

sed 's/([^()]*)//g'

See IDEONE demo

As for the update, you can remove everything from WORD1 till WORD3 using .*, but only if there is only one set of WORD1 and WORD3 (demo):

echo "WORD1 %WORD2% WORD3" | sed 's/WORD1.*WORD3/WORD1 WORD3/g'

With , it is not possible to use lookarounds (lookaheads here), nor lazy quantifiers to restrict the match to the leftmost WORD3 occurrences. And if you know for sure there is no % symbol in between, you can still use the negated character class approach (demo):

echo "WORD1 %WORD2% WORD3" | sed 's/%[^%]*%//g'

A generic solution is to do it in several steps:

  • replace the starting and ending delimiters with unused character (<UC>) (I am using Russian letters, but it should be some control character)
  • use the negated character class <UC1>[^<UC1><UC2>]*<UC2> to replace with the necessary replacement string
  • restore the initial delimiters.

Here is an example:

#!/bin/bash
echo "WORD1 %WORD2% WORD3 some text WORD1 %WORD2% WORD3" | 
  sed 's/WORD1/й/g' |
  sed 's/WORD3/ч/g' |
  sed 's/й[^йч]*ч/й ч/g' |
  sed 's/й/WORD1/g' |
  sed 's/ч/WORD3/g' 
 // => WORD1 WORD3 some text WORD1 WORD3

I am hardcoding a space, but it can be adjusted whenever necessary.

Sign up to request clarification or add additional context in comments.

6 Comments

Now I have another problem: how about if I wanna replace the string between two same patterns? Like WORD1 %WORD2% WORD3 => WORD1 WORD3?
These are not the same if you mean you have known WORD1 and WORD3 and you need to remove all between them. Maybe you need this.
This is a very common question. Please refrain from answering if you do not have the time to hunt down a good duplicate.
You have a gold badge in the regex tag. Your answer contains no lookarounds.
If you are referring to the OP's follow-up question; no, I am ignoring that. If the OP has a new question, they should post a new question, or edit the current question.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.