0

I need to insert a block of code (in fact it is an adsense ads script) between two specific tags (html file) that are the following:

</style>
<table border="1" class="dataframe">

I need to insert it in the THIRD occurrence of these two labels. The form of a typical Adsense block is:

<script async 
  src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"> 
</script>
<script>
 lorem ipsum...                          
</script>

In the end I need to have something like this:

</style>

 <script async 
  src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"> 
  </script>
  <script>
   lorem ipsum...                            
  </script>

<table border="1" class="dataframe">

I have inserted blocks of code using sed and indicating the line number, in this case it is not possible in this way because the line number can change. Thank you very much for any help.

3
  • awk '/pattern to match/{match_count++} {if (match_count==3) {print "your extraStuff"}1' htmlfile > new.htmlfile will give you something to experiment and search further on. The 1 after the closing } indicates 'print all input". Remove that to experiment. You can add explicit instructions on when to print input, using more {if{...}else{}} logic. Spend a few hours with the awk tutorial and you'll be on your way to awk guruhood ;-) Good luck. Commented May 27, 2019 at 16:54
  • But this all assumes you have control over the creation of your html output. html should really only be parsed with an html aware parser. Once and element breaks across lines, awk will cry fowl, as it is a line based parser, not an <tag> ...</tag> .. parser. Good luck. Commented May 27, 2019 at 16:59
  • Thank you @shellter Commented May 28, 2019 at 15:14

1 Answer 1

1

Try this, if you can use bash. This solution is not very fast, but it should work.

insert_rubbish.sh

#!/bin/bash

StartPattern='</style>'
StopPattern='<table border="1" class="dataframe">'

content=$(cat << EOT
<script async
  src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js">
</script>
<script>
 lorem ipsum...
</script>
EOT
)

# loop over all lines
while read -r line; do

    # if pattern found
    if [[ $line =~ $StopPattern && $lastline =~ $StartPattern ]]; then

       # is it the third fund? --> then print content
       if [[ $count == 2 ]]; then
          printf "%s\n" "$content"
       fi

       # count pattern match
       (( count++ ))
    fi

    # write line
    printf "%s\n" "$line"

    # save line for next pattern match
    lastline="$line"
done < "$1" 

Usage

insert_rubbish.sh "/path/to/your/file.html" > output.html
Sign up to request clarification or add additional context in comments.

1 Comment

Hello @UtLox: Your script is awesome, it worked perfectly. It is also a very robust script because you only have to change the pattern to make various inserts! I will keep the name you gave him and your authorship! Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.