39

I have two chunks of text that I would like to compare and see which words/lines have been added/removed/modified in Python (similar to a Wiki's Diff Output).

I have tried difflib.HtmlDiff but it's output is less than pretty.

Is there a way in Python (or external library) that would generate clean looking HTML of the diff of two sets of text chunks? (not just line level, but also word/character modifications within a line)

7 Answers 7

36

There's diff_prettyHtml() in the diff-match-patch library from Google.

Sign up to request clarification or add additional context in comments.

3 Comments

The .zip download link now gives a 404 :(
It's hard to tell if there's a way to generate a good side-by-side diff of multiple-line files with diff-match-patch. It seems mostly focused on character-level comparison, and the documentation on line-level is not very good (and the example is only in JavaScript).
Also I think its new home is here: github.com/google/diff-match-patch
26

Generally, if you want some HTML to render in a prettier way, you do it by adding CSS.

For instance, if you generate the HTML like this:

import difflib
import sys

fromfile = "xxx"
tofile = "zzz"
fromlines = open(fromfile, 'U').readlines()
tolines = open(tofile, 'U').readlines()

diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile)

sys.stdout.writelines(diff)

then you get green backgrounds on added lines, yellow on changed lines and red on deleted. If I were doing this I would take take the generated HTML, extract the body, and prefix it with my own handwritten block of HTML with lots of CSS to make it look good. I'd also probably strip out the legend table and move it to the top or put it in a div so that CSS can do that.

Actually, I would give serious consideration to just fixing up the difflib module (which is written in python) to generate better HTML and contribute it back to the project. If you have a CSS expert to help you or are one yourself, please consider doing this.

4 Comments

Someone implemented your proposal (as I often find is the case with Python). HtmlDiff has make_table() method which just creates the HTML table. So user can add own CSS to prettify it. Compared to accepted answer, this is included (from py 2.4).
Unfortunately the HTML generated by difflib.HtmlDiff is a pretty archaic table format that isn't well suited to customization with CSS. But it still works pretty well, if you don't need a lot of customization. You can probably change colors and fonts, but that's about it. The big secret that I almost missed is the wrapcolumn argument to the constructor, which lets you prevent the table from being arbitrarily wide.
This process shows the ENTIRE file side by side even if only ONE LINE HAS CHANGED. THis is a problem if the file is large. Not sure if there's a way to fix this
Yeah, the default shows the whole file. See the context and numlines arguments to difflib.HtmlDiff.make_file() for a way to just see the lines with differences, possibly surrounded by some amount of additional lines for context.
6

I recently posted a python script that does just this: diff2HtmlCompare (follow the link for a screenshot). Under the hood it wraps difflib and uses pygments for syntax highlighting.

Comments

1

not just line level, but also word/character modifications within a line

xmldiff seems to be a nice package for this purpose especially when you have XML/HTML to compare. Read more in their documentation.

Comments

1

Since the .. library from google seems to have no active development any more, I suggest to use diff_py

From the github page:

The simple diff tool which is written by Python. The diff result can be printed in console or to html file.

Comments

0

try first of all clean up both of HTML by lxml.html, and the check the difference by difflib

Comments

-3

A copy of my own answer from here.


What about DaisyDiff (Java and PHP vesions available).

Following features are really nice:

  • Works with badly formed HTML that can be found "in the wild".
  • The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
  • In addition to the default visual diff, HTML source can be diffed coherently.
  • Provides easy to understand descriptions of the changes.
  • The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.

1 Comment

This is rather no python related answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.