6

The problem

When opening very large XML files locally, on your machine, it's almost a certainty that it will take an age for that file to open - it can often mean your computer locks down because it thinks it's not responding.

This is an issue if you serve users XML backups of rather complex databases or systems they use - the likehood of them being able to open large backups, let alone use them, is slim.

Is pagination possible?

I use XSLT to present readable backups to users. In this same way, would it be possible to pull only a page at a time of data, to prevent the entire file from being read in one go, thus causing the issues above.

I imagine the answer is simply a no - but I would like to know if anyone else has seen the same issues and resolved them.

Note: This is on a local machine only, it must not require an internet connection. JavaScript can be used if it makes things easier.

4
  • Your first two paragraphs are wrong and largely unrelated to your question. Commented Jan 6, 2010 at 15:37
  • +1, I've had the same problem and have been struggling to find an editor capable of viewing/browsing very large (1GB+) XML files. Commented Jan 6, 2010 at 15:38
  • @bmargulies - if you say so, I'd say the votes and great answers below negate that, but each to their own. @Eric - You'll likely not find them, as I believe all editors have to read the entire file before loading it - having said that I had some success with Notepad++ sometimes. Commented Jan 7, 2010 at 11:02
  • The problem is not with opening files. That takes milliseconds, perhaps microseconds on SSDs. Reading them entirely into memory, transforming them into a viewable document - yes, that takes time. But it critically depends on the XML schema. For instance, Microsoft's .docx files (OOXML) open fairly quickly. Commented Jan 26, 2010 at 13:57

5 Answers 5

3

Pagination with XSLT is possible, but will probably not lead to the desired results: For XSLT to work, the whole XML document must be parsed into a DOM tree.

What you could do, is experiment with streaming transformations: http://stx.sourceforge.net/

Or you could preprocess the large XML file to cut it up into smaller bits before processing with XSLT. For this I'd use a command line tool like XMLStarlet

Sign up to request clarification or add additional context in comments.

1 Comment

I'm thinking it might be easier to simply cut the file up before presenting it for download (as a zip) to the user, which is kind of annoying.
2

Right on, very good question!

XSLT implementations I know require DOM, so they are bound to access the entire document (although it could perhaps be done in a lazy fashion)

Anyway, you should take a look at VTD-XML: http://vtd-xml.sourceforge.net/

The latest SAXON XSLT processor also supports rudimentary support for what is called "Streaming XSLT". Read about that here: http://www.saxonica.com/documentation/index/intro.html

That said, database backups are probably not the right use case for XML. If you have to deal with XML database backups, I would try to get away from those as fast as possible. Same for logs - a linear process should work by simply appending things. I mean, it would be even better of XML would allow a forest as top level structure, but I think that is never going to happen.

1 Comment

Hey Roland, this look promising. I was wondering if this would require an end-user to have anything installed other than a Browser? This needs to be viewable by both geeks and non-techs alike.
1

XMLMax Virtual xml editor will read, parse and display a 1 Gigabyte xml file in a treeview in about 30 seconds on a fast PC. Windows OS only. It will work with xml of any size or structure.

1 Comment

it is paid software with a trial version
0

HI, i don't know what programing language you are using but in C# using XMLReader i can read the file tag by tag and not the whole file. This way you can read only the first page and stop the reading. Best Regards, Iordan

Comments

0

One way to alleviate this problem would be to split the large XML files into a number of smaller XML documents. Depending on the type of data you may split or partition the file any number of ways (i.e. Day, Transaction, Entity, etc)

This will introduce a number of other challenges of course. For instance you will have to come up with a specialized parser if you need to view the data as a whole or across partitions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.