0

I have a file of this type:

16:00 [61]Al-Najma - Al-Rifaa [62]5.06 [63]3.55 [64]1.57 4

and i want remove all the strings inside square parentheses in order to obtain

16:00 Al-Najma - Al-Rifaa 5.06 3.55 1.57 4

I am trying with sed in this manner:

sed 's/\[.*]//g' file1 > file2

but i obtain

16:00 1.57 4

and with

sed 's/\[.[1234567890]]//g' file1 > file2

does not work if the string contains more than 2 digit.

how can i do this?

6 Answers 6

1

your pattern allows only one character, adding a star behind the pattern widens it to all matching characters.

sed 's/\[.[1234567890]]*//g' file1 > file2

alternative:

sed 's/\[^\]*//g' file1 > file2

that means: after the starting "[" everything but the "]" is OK, and that for as many characters as there come (the "*")

for further reading on sed: http://www.grymoire.com/Unix/Sed.html

Sign up to request clarification or add additional context in comments.

2 Comments

This may work for this solution but will not scale well for all types of string between two characters. The alternative?
[1234567890] can be shortened to [0-9]
1

Your first regex does not work because the quantifier * is greedy, meaning it matches as many characters as possible. Since . also matches brackets, it continues to match until the last closing bracket ] it can find.

So you basically have two options: Use a non-greedy quantifier or restrict the types of characters you can match. You have tried the second solution. I would go with using a negated character class instead:

sed 's/\[[^]]*\]//g'

I'm not sure if sed has non-greedy quantifiers, but perl does:

perl -lpwe 's/\[.*?\]//g'

Comments

0

Does escaping the closing ] help ?

sed 's/\[.*\]//g' file1 > file2

1 Comment

\[.*\] is greedy and will swallow up all characters between the first [ and the last ] including other ]['s.
0

You already got the sed answer, so I will add other one using awk:

awk '
  BEGIN { 
    FS = "\\[[^]]*\\]"; 
    OFS = " " 
  } 
  { 
    for (i=1; i<=NF; i++) 
      printf "%s", $i 
  } 
  END { 
    printf "\n" 
  }
' <<<"16:00 [61]Al-Najma - Al-Rifaa [62]5.06 [63]3.55 [64]1.57 4"

Output:

16:00 Al-Najma - Al-Rifaa 5.06 3.55 1.57 4

Comments

0

using awk:

$ echo '16:00 [61]Al-Najma - Al-Rifaa [62]5.06 [63]3.55 [64]1.57 4' | awk -F '\[[0-9]*\]' '$1=$1'
16:00  Al-Najma - Al-Rifaa  5.06  3.55  1.57 4

1 Comment

same as my solution (didn't post) ;) btw, the two "\" could be saved.
0

This might work for you:

echo "16:00 [61]Al-Najma - Al-Rifaa [62]5.06 [63]3.55 [64]1.57 4" |
sed 's/\[[^]]*\]//g'
16:00 Al-Najma - Al-Rifaa 5.06 3.55 1.57 4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.