0

i am trying to parse an xml document, after searching i found out that sax is the best choice, but the document is very large (1.5 GB) waited like 7 minutes but its still running my question is, is that normal ? or i can do better ?

public static void main(String argv[]) {

    try {

        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser saxParser = factory.newSAXParser();

        DefaultHandler handler = new DefaultHandler() {

            int c = 0;
            boolean id = false;
            boolean value = false;
            boolean orgin = false;
            boolean note = false;

            @Override
            public void startElement(String uri, String localName, String eName,
                    Attributes attributes) throws SAXException {

                if (eName.equalsIgnoreCase("ID")) {
                    id = true;
                }

                if (eName.equalsIgnoreCase("VALUE")) {
                    value = true;
                }

                if (eName.equalsIgnoreCase("ORGIN")) {
                    orgin = true;
                }

                if (eName.equalsIgnoreCase("NOTE")) {
                    note = true;
                }

            }

            @Override
            public void endElement(String uri, String localName,
                    String eName) throws SAXException {

            }

            @Override
            public void characters(char ch[], int start, int length) throws SAXException {

                if (id) {
                    System.out.println(new String(ch, start, length));
                    id = false;
                    System.out.println("record num : "+c++);
                }

                if (value) {
                    System.out.println(new String(ch, start, length));
                    value = false;
                }

                if (orgin) {
                    System.out.println(new String(ch, start, length));
                    orgin = false;
                }

                if (note) {
                    System.out.println(new String(ch, start, length));
                    note = false;
                }

            }

        };

        saxParser.parse("./transactions.xml", handler);

    } catch (Exception e) {
        e.printStackTrace();
    }

}
6
  • What do you mean with 1.5? 1.5 mb? Commented Sep 1, 2015 at 12:42
  • 1
    Take a look at this : stackoverflow.com/questions/3411773/… Commented Sep 1, 2015 at 12:46
  • @NathanHughes i am using this for the first time, any suggestions ? Commented Sep 1, 2015 at 13:10
  • my suggestions: stop getting bad code from mkyong, read the Oracle documentation, and follow sharonbn's advice. Commented Sep 1, 2015 at 13:12
  • @NathanHughes can you tell me why it is so bad ? Commented Sep 1, 2015 at 13:15

2 Answers 2

2
  1. You can save some time by changing equalsIgnoreCase to equals (unless you really encounter "ValuE" and "valUE" and "VaLuE" ...)
  2. The printing is probably taking most of the time. IO operations are usually the bottleneck
Sign up to request clarification or add additional context in comments.

4 Comments

i was printing to test, i am going to save the data to database, do you have any suggestion for the database ? time is important
most database engines support batch insert/update (inserting of multiple rows in one statement) - use it
the last tip I can give you from my personal experience is that you need to be aware that the parser may call characters() method multiple times for the same xml element and you need to concatanate the result to get the full text. you can then do the processing (DB call etc) in endElement() see details here stackoverflow.com/questions/4567636/…
if you mean instance variable then yes
1

If you parse such a huge file you should use Stax instead of Sax. With Stax you can skip parts of your file which makes it faster and it's faster.

StAX is a "pull" type of API. As discussed, there are Cursor and Event Iterator APIs. There are both reading and writing sides of the API. It is more developer friendly than SAX. StAX, like SAX, does not require an entire document to be held in memory. However, unlike SAX, an entire document need not be read. Portions can be skipped. This may result in even improved performance over SAX.

(DOM vs SAX XML parsing for large files)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.