3

Looking to run a script on the server to look at a path of a file and replace a word whereas it matches in the div.

So need to replace _toself to viewers where author equals a certain email [email protected]

URL=/var/www/sever/temp/fhyw1 FILE=user.txt

<div class='entry'>
  <div class='pageurl'>temp/fhyw1</div>
  <div class='context'>text</div>
  <div class='subject'>testing</div>
  <div class='notetext'></div>
  <div class='signed'>USER</div>
  <div class='author'>[email protected]</div>
  <div class='color'>0</div>
  <div class='visibility'>shared</div>
  <div class='to'>_toself</div>
  <div class='num'>4</div>
</div>
<div class='entry'>
  <div class='pageurl'>temp/fhyw1</div>
  <div class='context'>text</div>
  <div class='subject'>testing</div>
  <div class='notetext'></div>
  <div class='signed'>USER</div>
  <div class='author'>[email protected]</div>
  <div class='color'>0</div>
  <div class='visibility'>shared</div>
  <div class='to'>_viewers</div>
  <div class='num'>4</div>
</div>

3 Answers 3

1

This sed solution might work for you:

 sed -e '/^<div class=.entry.>/,\_^</div>_{//!{H;d};\_^</div>_!{h;d};x;/author.>[email protected]/s/_toself/SUBSTITUTE TEXT/;p;x}' text_file

N.B. You will need to replace SUBSTITUE TEXT with the viewers,_viewers or whatever

The sed command allows all lines other than those between <div class=.entry.> and </dev>(. allows for single 'or double quotes ") to pass through unchanged. If the line begins with <div class=.entry.> it is copied to a register call the hold space (HS) and then the pattern space (PS) is deleted. All other lines are appended to the HS and then deleted accepting the line </div>. When this line appears the HS is swapped with the PS and if this multiline contains author.>[email protected] then SUBSTITUTE TEXT is substituted for _toself. The multiline is printed out regardless, then the PS replaces the HS and it in turn to printed out.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the sed command seems to work ok but I'm getting blank lines inserted every div entry and /div ?
I must have mucked up whilst copying and pasting. Solution corrected - a regex had gone astray \_^</div>_
1

We have some text

$> cat ./text 
<div class='entry'>
  <div class='pageurl'>temp/fhyw1</div>
  <div class='context'>text</div>
  <div class='subject'>testing</div>
  <div class='notetext'></div>
  <div class='signed'>USER</div>
  <div class='author'>[email protected]</div>
  <div class='color'>0</div>
  <div class='visibility'>shared</div>
  <div class='to'>_toself</div>
  <div class='num'>4</div>
</div>
<div class='entry'>
  <div class='pageurl'>temp/fhyw1</div>
  <div class='context'>text</div>
  <div class='subject'>testing</div>
  <div class='notetext'></div>
  <div class='signed'>USER</div>
  <div class='author'>[email protected]</div>
  <div class='color'>0</div>
  <div class='visibility'>shared</div>
  <div class='to'>_viewers</div>
  <div class='num'>4</div>
</div>

And we need to replace _toself 'to' value with viewers, but only in divs, where 'author' equals a [email protected]

I think sed can helps you, but you should have some experience with it to formulate all condition with sed syntax.

So, we can read file in while loop, cut it into a div-blocks and change one value by another only if blocks 'authors' value is equal some email.

#!/bin/bash

mail="[email protected]"
to_value_old=_toself
to_value_new=viewers

while IFS= read -r line; do
    if [[ -z "$( echo "$line" | grep -o -P "^<\/div>$" )" ]]; then
        entry_block="${entry_block}${line}\n"
    else
        entry_block="${entry_block}</div>\n"
        entry_block="$( echo -e "${entry_block}" )"
        if [[ -n "$( echo "${entry_block}" | grep -P "\<div class=\'author\'\>${mail}\<\/div\>" )" ]]; then
            entry_block="$( echo "${entry_block}" | sed -r -e "s/<div\ class='to'>${to_value_old}<\/div>/<div\ class='to'>${to_value_new}<\/div>/"  )"
        fi
        echo "${entry_block}"
        entry_block=""
    fi
done < ./text

And we get

$> ./div.sh 
<div class='entry'>
  <div class='pageurl'>temp/fhyw1</div>
  <div class='context'>text</div>
  <div class='subject'>testing</div>
  <div class='notetext'></div>
  <div class='signed'>USER</div>
  <div class='author'>[email protected]</div>
  <div class='color'>0</div>
  <div class='visibility'>shared</div>
  <div class='to'>viewers</div>
  <div class='num'>4</div>
</div>
<div class='entry'>
  <div class='pageurl'>temp/fhyw1</div>
  <div class='context'>text</div>
  <div class='subject'>testing</div>
  <div class='notetext'></div>
  <div class='signed'>USER</div>
  <div class='author'>[email protected]</div>
  <div class='color'>0</div>
  <div class='visibility'>shared</div>
  <div class='to'>_viewers</div>
  <div class='num'>4</div>
</div>

Done.

1 Comment

Thanks for this, i'm currently getting replace_anno.sh: 19: [[: not found replace_anno.sh: 19: [[: not found -e </div> when trying to run it with the text file
0

If you only want to replace all occurrences of _toself by something else, then sed will do the job perfectly.

sed 's/_toself/replacement_string/'

If you want to do this only within the div with a specified author then it's a bit more tricky.

2 Comments

Yep going to need to do it for a specified author
Not too much - Grepping and find hasn't helped too much. Looking at putting it through a while read

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.