2

I have a html file that has multiple nested tables:

    <table>
      <tr>
        <td>
            <table>
               <tr>
                 <td>
                      <table>
                       ...
                       </table>
                 </td>
               </tr>
           </table>
        </td>
      </tr>
    </table>

I would like to add classes to each of the tables as such:

      <table class="table1">
      <tr>
        <td>
            <table class="table2">
               <tr>
                 <td>
                      <table class="table3">
                       ...
                       </table>
                 </td>
               </tr>
           </table>

Based on extensive searches, I cobbled the following bash script, but it's not working at all:

#!/bin/bash
strng="<table"
index=1
for entry in `grep -n $strng $1`
do
line=`echo $entry | awk -F":" '{print$1}'`
sed -e "$line s/$strng/$strng class=\"table$index\"/" -i $1
index=$(($index + 1))
done

Any recommendations will be appreciated.

1

3 Answers 3

3

If you want a general solution, you should use an html-specific tool. If you know that your html is limited to the format that you show, then try:

awk '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' file.html

Example

$ awk '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' file.html
    <table class="table1">
      <tr>
        <td>
            <table class="table2">
               <tr>
                 <td>
                      <table class="table3">
                       ...
                       </table>
                 </td>
               </tr>
           </table>
        </td>
      </tr>
    </table>

How it works

  1. /<\/table>/{i--}

    For any line that contains </table>, we decrement variable i.

  2. /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")}

    For any line that contains <table>, we increment variable i and substitute in <table> with the class value.

  3. 1

    This is awk's cryptic shorthand for print-the-line.

Changing the file in-place

If you want to change the file in-place and you have GNU awk (gawk), then use:

awk -i inplace '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' file.html

For other awk:

awk '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' file.html >tmp && mv tmp file.html

As a bash script

#!/bin/bash
# Usage: script.sh infile outfile
awk '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' "$1" >"$2"

Note that the file names, $1 and $2, are inside double-quotes. This prevents surprises in case the names contain whitespace or other shell-active characters.

As a matter of style, not substance, some people prefer spreading out awk code over multiple lines. This can make it easier to understand the code or to modify the code when one wants to add new features. Thus, if one likes, the above script can also be written as:

#!/bin/bash
# Usage: script.sh infile outfile
awk '
    /<\/table>/{ i-- }

    /<table>/{ sub(/<table>/, "<table class=\"table"++i"\">") }

    1
    ' "$1" >"$2"
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you, this worked pefectly. I used your suggestion thus: #! /bin/bash awk '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' $1 > $2 exit
By the way, if you want to know why I'm trying this, is because Gnucash has extremely limited formatting capabilities when printing reports. However, It allows one to export to html where I can apply css styles. But this is a story for another time.
@AlanW Very good! Glad it worked. I added your script to the answer with some minor changes. I haven't used Gnucash but that sounds interesting.
I should have provided more details. Awk works for the table tag as I presented, which is not a real-world example. my code actually had this: <table width='600.00' cellpadding=0 cellspacing=0> so I modified the awk line: awk '/<\/table>/{i--} /<table/{sub(/<table/, "<table class=\"table"++i"\"")} 1'
One more thing @John1024, somehow the code failed to enumerate past 'class=table2' so I removed '/<\/table>/{i--}
|
1

To address why your script is not working: for X in Y splits on whitespace, not only newlines like you're expecting.

Set $IFS to newline and it should work as intended.

#!/bin/bash
IFS='
'
strng="<table"
index=1
for entry in `grep -n $strng $1`
do
  line=`echo $entry | awk -F":" '{print$1}'`
  sed -e "$line s/$strng/$strng class=\"table$index\"/" -i $1
  index=$(($index + 1))
done

Otherwise your grep command is returning:

1:    <table> 
4:            <table>
7:                      <table>

and processing in 6 iterations:

1:
<table>
4:
<table>
7:
<table>

You can see this by adding set -x to the top of your script to get a trace of the commands.

3 Comments

Sorry, but adding IFS wiped the file and substituted all text with 419 lines of: ellpadding="0"> s/<table class="table1"/<table class="table31"/ ... ellpadding="0"> s/<table class="table419"/<table class="table31"/
Then there must be some other condition not covered in the test you posted. I get the desired output exactly. Either way, the awk solution posted by @john1024 is certainly the better approach.
Thank you, @mjb2kmn, perhaps it's an environment issue dealing with gnu, posix, or some other cryptic OS variable. I'm using Linux Mint 17.
1

Just to give a quick solution:

#!/bin/bash
temp_file="$( mktemp )"
sed 's/\(<table\)/\1 class="$_field_$"/g' "$1" > "$temp_file"
index=0
while grep -e '[$]_field_[$]' "$temp_file" >/dev/null
 do
    sed -i "s/[$]_field_[$]/$index/" "$temp_file"
    ((++index))
 done
cp "$temp_file" "$1"
rm -f "$temp_file"

But it has to be mentioned that manipulation of XML attributes should not be done using tools like sed or awk. Use a purpose-built tool as suggested in this answer.

1 Comment

Thank you for suggesting xmlstarlet from the other post. I'll try that one next. Of course, it is always best to use the right tool for the right job.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.