48

Is it possible to use HTML Tidy to just indent HTML code?

Sample Code

<form action="?" method="get" accept-charset="utf-8">

<ul>
<li>
<label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q" />
</li>
<li><input class="submit" type="submit" value="Search" /></li>
</ul>


</form>

Desired Result

<form action="?" method="get" accept-charset="utf-8">
    <ul>
        <li>
        <label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q"/>
        </li>
        <li><input class="submit" type="submit" value="Search"/></li>
    </ul>
</form>

If I run it with the standard command, tidy -f errs.txt -m index.html then I get this

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 15.3.6), see www.w3.org">
<title></title>
</head>
<body>
<form action="?" method="get" accept-charset="utf-8">
<ul>
<li><label class="screenReader" for=
"q">Keywords</label><input type="text" name="q" value="" id=
"q"></li>
<li><input class="submit" type="submit" value="Search"></li>
</ul>
</form>
</body>
</html>

How can I omit all the extra stuff and actually get it to indent the code?

Forgive me if that's not a feature that it's supposed to support, what library / tool am I looking for?

1
  • 1
    Other folks have suggested that prettydiff.com/?m=beautify&html might be a better option if you just want to indent. Commented Mar 3, 2014 at 15:55

7 Answers 7

32

Use the indent, tidy-mark, and quiet options:

tidy \
  -indent \
  --indent-spaces 2 \
  -quiet \
  --tidy-mark no \
  index.html

Or, using a config file rather than command-line options:

indent: auto
indent-spaces: 2
quiet: yes
tidy-mark: no

Name it tidy_config.txt and save it the same directory as the .html file. Run it like this:

tidy -config tidy_config.txt index.html

For more customization, use the tidy man page to find other relevant options such as markup: no or force-output: yes.

Sign up to request clarification or add additional context in comments.

9 Comments

This does not answer the question. It is still adding a meta generator tag. Is there a way to turn off all changes except indentation?
Use the tidy man page to reference and test the flags. Try turning off defaults by adding markup: no or input-xml: yes and force-output: yes to the config file.
If you are a fan of oneliners without intermediate files, you can write the same as tidy -xml --indent auto --indent-spaces 2 --quiet yes index.html.
tidy-mark: no should turn off the meta generator tag,
Leave out input-xml: yes (-xml) since it will flag <input> as an error.
|
25
+50

I didn't found a possibility "only reindent - without any changes". The next config file will "repair" as low as possible and (mostly) only re-indent the html. Tidy still correcting some errorish conditions, like duplicated (repeated) attributes.

#based on http://tidy.sourceforge.net/docs/quickref.html
#HTML, XHTML, XML Options Reference
anchor-as-name: no  #?
doctype: omit
drop-empty-paras: no
fix-backslash: no
fix-bad-comments: no
fix-uri:no
hide-endtags: yes   #?
#input-xml: yes     #?
join-styles: no
literal-attributes: yes
lower-literals: no
merge-divs: no
merge-spans: no
output-html: yes
preserve-entities: yes
quote-ampersand: no
quote-nbsp: no
show-body-only: auto

#Diagnostics Options Reference
show-errors: 0
show-warnings: 0

#Pretty Print Options Reference
break-before-br: yes
indent: yes
indent-attributes: no   #default
indent-spaces: 4
tab-size: 4
wrap: 132
wrap-asp: no
wrap-jste: no
wrap-php: no
wrap-sections: no

#Character Encoding Options Reference
char-encoding: utf8

#Miscellaneous Options Reference
force-output: yes
quiet: yes
tidy-mark: no

For example the next html-fragment

<div>
<div>
<p>
not closed para
<h1>
h1 head
</h1>
<ul>
<li>not closed li
<li>closed li</li>
</ul>
some text
</div>
</div>

will changed to

<div>
    <div>
        <p>
            not closed para
        <h1>
            h1 head
        </h1>
        <ul>
            <li>not closed li
            <li>closed li
            </ul>some text
    </div>
</div>

As you can notice, the hide-endtags: yes hides the closing </li> from the second bullet in the input. Setting the hide-endtags: no - will get the next:

<div>
    <div>
        <p>
            not closed para
        </p>
        <h1>
            h1 head
        </h1>
        <ul>
            <li>not closed li
            </li>
            <li>closed li
            </li>
        </ul>some text
    </div>
</div>

so, tidy adds closing </p> and closing </li> to first bullet.

I didn't found a possibility preserve everything on input and only reindent the file.

3 Comments

Thanks for this. Used this in my PHP script and the html output looks great. Took me a while to copy the attributes so here's a pastebin for anyone looking to config tidy like this: pastebin.com/JP8ucTzc
hide-endtags: NO
Thx for this explanation. But how to turn "</ul>some text" into "</ul>\nsometext"? And how to keep the empty lines in the source code?
18

You need the following option:

tidy --show-body-only yes -i 4 -w 80 -m file.html

http://tidy.sourceforge.net/docs/quickref.html#show-body-only

-i 4 - indents 4 spaces (EDIT: tidy never uses tabs)
or
--indent-with-tabs yes - instead (--tab-size may affect wrapping)

-w 80 - wrap at column 80 (default on my system: 68, very narrow)

-m - modify file inplace

(you may want to leave out the last option, and examine the output first)

Showing only body, will naturally leave out the tidy-mark (generator meta).

Another cool options are: --quiet yes - doesn't print W3C advertisements and other unnecessary output (errors still reported)

2 Comments

The "show-body-only: yes" is the correct answer (even if partial - tidy can't not fix broken tags).
Tidy does use tabs, just add --indent-with-tabs yes to command line arguments.
7

To answer the poster's original question, using Tidy to just indent HTML code, here's what I use:

tidy --indent auto --quiet yes --show-body-only auto --show-errors 0 --wrap 0 input.html

input.html

<form action="?" method="get" accept-charset="utf-8">

<ul>
<li>
<label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q" />
</li>
<li><input class="submit" type="submit" value="Search" /></li>
</ul>


</form>

Output:

<form action="?" method="get" accept-charset="utf-8">
  <ul>
    <li><label class="screenReader" for="q">Keywords</label><input type="text" name="q" value="" id="q"></li>
    <li><input class="submit" type="submit" value="Search"></li>
  </ul>
</form>

No extra HTML code added. Errors are suppressed. To find out what each option does, it's best to refer to the official reference.

2 Comments

In vim: %!tidy --show-errors 0 --show-body-only auto -qi -w 0
Unfortunately these options are not enough. tidy inserts a form tag around this code: <input id="a" type="checkbox"><label for="a">a</label> But this one is fine: <label><input id="a" type="checkbox">a</label> tidy --version outputs HTML Tidy for Linux released on 25 March 2009
2

I am very late to the party :)

But in your tidy config file set

tidy-mark: no

by default this is set to yes.

Once done, tidy will not add meta generator tag to your html.

2 Comments

My version of tidy (and probably any other) will accept configuration options also as command line options (which is sometimes more desirable than dragging around a config file), like: tidy --tidy-mark no -utf8 -w 80 -i file.html.
This does not prevent the generation of DOCTYPE, html, and head tags.
2

If you'd like to simply format whatever html you receive, ignore errors and indent the code nicely this is a good one liner using tidy

tidy --show-body-only yes -i 4 -w 80 -m -quiet --force-output y -wrap 0 2>/dev/null

You can use it with curl too

curl -s someUrl | tidy --show-body-only yes -i 4 -w 80 -m -quiet --force-output y -wrap 0 2>/dev/null

2 Comments

Where does the input file go in the first command?
The second command starts with curl -s someUrl | and that | is a redirect to the rest of the command. So, you could curl some website and redirect it, or you could say cat index.html | tidy --show-body-only yes -i 4 -w 80 -m -quiet --force-output y -wrap 0 2>/dev/null for instance
0

None of the html tidy based solutions worked for me - all of them modified the content to some extent, so I create a CLI tool and Go package https://github.com/a-h/htmlformat based off https://github.com/ericchiang/pup

It uses the Go net/html package to parse the HTML, and a custom writer to write out the content with indentation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.