html5lib · ambv · May 7, 2013 · May 8, 2013 · May 8, 2013 · May 8, 2013
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -6,9 +6,56 @@ Change Log
 
 Released on XXX, 2013
 
+* Implementation updated to implement the `HTML specification
+  <http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May
+  2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867).
+
+* Python 3.2+ supported in a single codebase using the ``six`` library.
+
+* Removed support for Python 2.5 and older.
+
+* Removed the deprecated Beautiful Soup 3 treebuilder.
+  ``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that
+  since it doesn't support namespaces, foreign content like SVG and
+  MathML is parsed incorrectly.
+
 * Removed ``simpletree`` from the package. The default tree builder is
-  now ``etree`` (using the ``xml.etree.ElementTree/cElementTree``
-  implementation).
+  now ``etree`` (using the ``xml.etree.cElementTree`` implementation if
+  available, and ``xml.etree.ElementTree`` otherwise).
+
+* Removed the ``XHTMLSerializer`` as it never actually guaranteed its
+  output was well-formed XML, and hence provided little of use.
+
+* Optional heuristic character encoding detection now based on
+  ``charade`` for Python 2.6 - 3.3 compatibility.
+
+* Optional ``Genshi`` treewalker support fixed.
+
+* Many bugfixes, including:
+
+  * #33: null in attribute value breaks XML AttValue;
+
+  * #4: nested, indirect descendant, <button> causes infinite loop;
+
+  * `Google Code 215
+    <http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly
+    detect seekable streams;
+
+  * `Google Code 206
+    <http://code.google.com/p/html5lib/issues/detail?id=206>`_: add
+    support for <video preload=...>, <audio preload=...>;
+
+  * `Google Code 205
+    <http://code.google.com/p/html5lib/issues/detail?id=205>`_: add
+    support for <video poster=...>;
+
+  * `Google Code 202
+    <http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode
+    file breaks InputStream.
+
+* Source code is now mostly PEP 8 compliant.
+
+* Test harness has been improved and now depends on ``nose``.
 
 
 0.95

diff --git a/README.rst b/README.rst
@@ -1,63 +1,98 @@
 html5lib
 ========
 
+.. image:: https://travis-ci.org/html5lib/html5lib-python.png?branch=master
+  :target: https://travis-ci.org/html5lib/html5lib-python
+
 html5lib is a pure-python library for parsing HTML. It is designed to
 conform to the WHATWG HTML specification, as is implemented by all major
 web browsers.
 
 
-Requirements
-------------
+Usage
+-----
 
-Python 2.6 and above as well as Python 3.0 and above are
-supported. Implementations known to work are CPython (as the reference
-implementation) and PyPy. Jython is known *not* to work due to various
-bugs in its implementation of the language. Others such as IronPython
-may or may not work; if you wish to try, you are strongly encouraged
-to run the testsuite and report back!
+Simple usage follows this pattern:
 
-The only required library dependency is ``six``, this can be found
-packaged in PyPI.
+.. code-block:: python
 
-Optionally:
+  import html5lib
+  with open("mydocument.html", "rb") as f:
+      document = html5lib.parse(f)
 
-- ``datrie`` can be used to improve parsing performance (though in
-  almost all cases the improvement is marginal);
+or:
 
-- ``lxml`` is supported as a tree format (for both building and
-  walking) under CPython (but *not* PyPy where it is known to cause
-  segfaults);
+.. code-block:: python
 
-- ``genshi`` has a treewalker (but not builder); and
+  import html5lib
+  document = html5lib.parse("<p>Hello World!")
 
-- ``charade`` can be used as a fallback when character encoding cannot
-  be determined; ``chardet``, from which it was forked, can also be used
-  on Python 2.
+By default, the ``document`` will be an ``xml.etree`` element instance.
+Whenever possible, html5lib chooses the accelerated ``ElementTree``
+implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x).
+
+Two other tree types are supported: ``xml.dom.minidom`` and
+``lxml.etree``. To use an alternative format, specify the name of
+a treebuilder:
+
+.. code-block:: python
+
+  import html5lib
+  with open("mydocument.html", "rb") as f:
+      lxml_etree_document = html5lib.parse(f, treebuilder="lxml")
+
+To have more control over the parser, create a parser object explicitly.
+For instance, to make the parser raise exceptions on parse errors, use:
+
+.. code-block:: python
+
+  import html5lib
+  with open("mydocument.html", "rb") as f:
+      parser = html5lib.HTMLParser(strict=True)
+      document = parser.parse(f)
+
+When you're instantiating parser objects explicitly, pass a treebuilder
+class as the ``tree`` keyword argument to use an alternative document
+format:
+
+.. code-block:: python
+
+  import html5lib
+  parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
+  minidom_document = parser.parse("<p>Hello World!")
+
+More documentation is available at http://html5lib.readthedocs.org/.
 
 
 Installation
 ------------
 
-html5lib is packaged with distutils. To install it use::
+html5lib works on CPython 2.6+, CPython 3.2+ and PyPy.  To install it,
+use:
 
-  $ python setup.py install
+.. code-block:: bash
 
+    $ pip install html5lib
 
-Usage
------
 
-Simple usage follows this pattern::
+Optional Dependencies
+---------------------
 
-  import html5lib
-  with open("mydocument.html", "r") as fp:
-      document = html5lib.parse(f)
+The following third-party libraries may be used for additional
+functionality:
 
-or::
+- ``datrie`` can be used to improve parsing performance (though in
+  almost all cases the improvement is marginal);
 
-  import html5lib
-  document = html5lib.parse("<p>Hello World!")
+- ``lxml`` is supported as a tree format (for both building and
+  walking) under CPython (but *not* PyPy where it is known to cause
+  segfaults);
 
-More documentation is available in the docstrings.
+- ``genshi`` has a treewalker (but not builder); and
+
+- ``charade`` can be used as a fallback when character encoding cannot
+  be determined; ``chardet``, from which it was forked, can also be used
+  on Python 2.
 
 
 Bugs
@@ -70,28 +105,21 @@ Please report any bugs on the `issue tracker
 Tests
 -----
 
-These are contained in the html5lib-tests repository and included as a
-submodule, thus for git checkouts they must be initialized (for
-release tarballs this is unneeded)::
+Unit tests require the ``nose`` library and can be run using the
+``nosetests`` command in the root directory. All should pass.
+
+Test data are contained in a separate `html5lib-tests
+<https://github.com/html5lib/html5lib-tests>`_ repository and included
+as a submodule, thus for git checkouts they must be initialized::
 
   $ git submodule init
   $ git submodule update
 
-And then they can be run, with ``nose`` installed, using the
-``nosetests`` command in the root directory. All should pass.
+This is unneeded for release tarballs.
 
 If you have all compatible Python implementations available on your
-system, you can run tests on all of them by using tox::
-
-  $ pip install tox
-  $ tox
-  ...
-  _______________________ summary ______________________
-    py26: commands succeeded
-    py27: commands succeeded
-    py32: commands succeeded
-    py33: commands succeeded
-    congratulations :)
+system, you can run tests on all of them using the ``tox`` utility,
+which can be found on PyPI.
 
 
 Contributing
@@ -121,5 +149,5 @@ Questions?
 
 There's a mailing list available for support on Google Groups,
 `html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_,
-though you may have more success (and get a far quicker response)
-asking on IRC in #whatwg on irc.freenode.net.
+though you may get a quicker response asking on IRC in #whatwg on
+irc.freenode.net.