1

I created a website where you can import an XML file and then read it out. It works perfectly fine for most files but I tried using an XML file with 730MB and it doesn't work anymore. I don't seem to be getting any errors on the console, but if I for example use this code,

numberOfReports = xmlDoc.getElementsByTagName("DailyReport").length;

I always get 0 even though it should be far more than that, since the XML files definetely contains multiple <DailyReport> elements. My function to import and parse the files looks like this:

// Function to import and serialize the XML file
function import_XML() {
    var input = document.createElement('input');
    input.type = 'file';

    input.onchange = e => {

        // getting a hold of the file reference
        file = e.target.files[0];

        // setting up the reader
        var reader = new FileReader();
        reader.readAsText(file, 'UTF-8');

        // Tell the reader what to do when it's done reading
        reader.onload = readerEvent => {
            content = readerEvent.target.result;
            const parser = new DOMParser();
            xmlDoc = parser.parseFromString(content, "application/xml");
            console.log(xmlDoc.documentElement.nodeName == "parsererror" ? "Error while parsing XML File" : xmlDoc.documentElement.nodeName);
            console.log("content: " + content);

            // Number of reports in the XML file
            numberOfReports = xmlDoc.getElementsByTagName("DailyReport").length;
            console.log("number of daily reports: " + numberOfReports);
            updateTable();

        }
    }
    input.click();
}

The content I get from content = readerEvent.target.result; in the console is also just empty:

screenshot of console

I'm not sure if it's because the file is too large, but the XML file should not have any malformations. Can anyone help me with this problem? Would really appreciate any help!

3
  • 1
    When asking for help, please take the time to indent and format your code readably. It's really hard to read only partially-indented code. (I've run the code through a basic formatter for you.) Commented Aug 16, 2021 at 11:24
  • From your console screenshot, it looks like content is an empty string. That's consistent with the nodeName being shown as "html" (I get that when I do (new DOMParser().parseFromString("", "application/xml")).documentElement.nodeName). So the question is: Why is the string empty? (Also: where do you declare content?) Commented Aug 16, 2021 at 11:32
  • I declared content at the top of the file as var content;, so that I could access it in other functions. As for why the string is empty, I don't really know. For working XML files, the content returns the whole XML as a string. Commented Aug 16, 2021 at 11:36

1 Answer 1

2

I suspect you're exceeding the maximum string length of your browser's JavaScript engine. Different engines have different limits. MDN says Firefox's limit is about 1GB (although I just tried an experiment and it was more like 800MB). A quick experiment in Brave (Chrome-like) suggests a maximum of about 512MB:

let size = 0;
const chunk = "".padStart(4096, " ");
const max = 800 * 1024 * 1024;
try {
    let str = "";
    while (str.length < max) {
        size = str.length;
        str += chunk;
    }
    console.log(`worked! size = ${size / 1024 / 1024}`);
} catch {
    console.log(`ERROR, size = ${size / 1024 / 1024}`);
}

The same experiment in Node.js (which uses the same JavaScript engine as Chromium-based browsers, V8) yields the same result, suggesting it's the limit in V8.

Unfortunately, DOMParser only accepts strings, not (say) blobs. I think you're probably not going to be able to handle files that large on V8-based browsers.

I suspect DOMParser will get a method that allows it to read streams someday, but that doesn't help you now. The only solution I can think of is to find an XML parser written in JavaScript that either supports streams or that you could adapt to use a stream. There are several XML parsers in npm packages, there might be one that can use a blob, or a ReadableStream, or one that supports Node.js streams you could adapt to work with ReadableStream (and the browser's version of XML documents rather than whatever they're using on Node.js).

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your answer and the code snippet. Do you think there is any way out of this? Maybe parsing the XML file differently or any way where you don't have to use a string?
@Mimi - I don't know of a specific way I'm afraid. I've added to the end of the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.