I have 200,000 XML files I want to parse and store in a database.
Here is an example of one: https://gist.github.com/902292
This is about as complex as the XML files get. This will also run on a small VPS (Linode) so memory is tight.
What I am wondering is:
1) Should I use a DOM or SAX parser? DOM seems easier and faster since each XML is small.
2) Where is a simple tutorial on said parser? (DOM or SAX)
Thanks
EDIT
I tried the DOM route even though everyone suggested SAX. Mainly because I found an "easier" tutorial for DOM and I thought that since the average file size was about 3k - 4k it would easily be able to hold that in memory.
However, I wrote a recursive routine to handle all 200k files and it gets about 40% of the way through them and then Java runs out of memory.
Here is part of the project. https://gist.github.com/905550#file_xm_lparser.java
Should I ditch DOM now and just use SAX? Just seems like with such small files DOM should be able to handle it.
Also, the speed is "fast enough". It's taking about 19 seconds to parse 2000 XML files (before the Mongo insert).
Thanks