A big set of XML files have the wrong encoding defined. It should be utf-8 but the content has latin-1 characters all over the place. What's the best way to parse this content?
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Edit: this is happening with Adobe InDesign IDML files, it seems the "Content" text has latin-1 but the rest could be utf-8. I'm favoring normal parsing with utf-8, then reencode the Unicode text chunks in Content to utf-8 and then re-parsing with latin-1. What a mess. ಠ_ಠ