1

My Node app gets an HTML page via axios, parses it via htmlparser2 then sends the valuable information to a frontend JS app as JSON.

The HTML page has some JavaScript in it that creates an array, and I need to work with that array in my code. htmlparser2 gets the content of the script as a string. I have two options to handle it as far as I know:

  1. Write a parser that goes through the string and extracts the required info (doable, but complicated)
  2. Run string as some JavaScript code and handle the values from that.

Assume I want to go with option 2. According to this StackOverflow question, using Node's VM module is possible, but the official documentation says "The node:vm module is not a security mechanism. Do not use it to run untrusted code."

I consider the code in my use case untrusted. What would be a safe solution for this?

EDIT: A snippet from the string:

hatizsakCucc = new Array();
hazbanCucc = new Array();
function adatokMessage(targyIndexStr,tomb) {
    var targyIndex = parseInt(targyIndexStr);
    if (tomb.length<1) alert("Nincs semmi!");
    else alert(tomb[targyIndex]);
}

hatizsakCucc[0]="Név: ezüst\nSúly: 0.0001 kg.\nMennyiség: 453\nÖsszsúly: 0.0453 kg.\n";
hatizsakCucc[1]="Név: kaja\nSúly: 0.4 kg.\nÁr: 2 ezüst\nMennyiség: 68\nÖsszár: 136 ezüst\nÖsszsúly: 27.2 kg.\n";
hatizsakCucc[2]="Típus: fegyver\nNév: bot\nSúly: 2 kg.\nÁr: 6 ezüst\nMin. szint: 1\nMaximum sebzés: 6\nSebzés szórás: 5\nFajta: ütő/zúzó\n";
hatizsakCucc[3]="Típus: fegyver\nNév: parittya\nSúly: 0.3 kg.\nÁr: 14 ezüst\nMin. szint: 1\nMaximum sebzés: 7\nSebzés szórás: 4\nFajta: távolsági\n";
hatizsakCucc[4]="Név: csodatarisznya\nSúly: 4 kg.\nÁr: 1000 ezüst\nExtra: templomi árú\n";
hatizsakCucc[5]="Név: imamalom\nSúly: 5 kg.\nÁr: 150 ezüst\nExtra: templomi árú\n";

The whole string is about 100 lines of this, so it's not too much data.

What I need is the contents of the hatizsakCucc array. Actually, getting an array of that it not too difficult with a regex, I'm realizing now.

hatizsakSzkript.match(/hatizsakCucc(.*)\\n/g);

This gives me an array of the hatizsakCucc elements, so I guess my problem is solved.

That said, I'm still curious about the possibility of running "untrusted" code safely.

Further context: I plan parse each array element so it will be an object, the object elements will be the substring separated by the \n-s

So the expected result for the first array element will be:

hatizsakCucc[0]{
    nev: "ezüst",
    suly: 0.0001,
    mennyiseg: ...
}

I'll write a function that splits the string to substrings at the \n then parse the data with a match().

3
  • 2
    some JavaScript in it that creates an array - can we see that code? Parsing javascript is not necessary complicated, there are quite a few tools that can handle that. Commented Nov 8, 2022 at 22:29
  • 1
    You could use puppeteer from nodejs and let the chromium engine load (the same engine used in the Chrome browser), parse and run the page. Then, use the puppeteer API to get info from the page. This uses the protections built into chromium for safely executing and isolating the Javascript and it also allows the Javascript to run in the context of the page itself which, depending upon the Javascript, may also be needed. Commented Nov 8, 2022 at 22:35
  • @jfriend00 that sounds interesting! I would assume it's not the most efficient solution for my simple use case, but it sounds useful! Commented Nov 8, 2022 at 22:50

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.