0

I am very new to regex and I'm not sure how to pluck a piece of test from a very large string using regex.

suppose the string is this: FYI: This string would be generated dynamically pulling different elements from the database and the dom. I don't have much control on how it gets created.

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Voluptas architecto dicta amet cumque, atque, labore eos nobis earum fuga tempore officiis excepturi rerum placeat. Perferendis, earum officiis veniam dicta eius aliquid, similique porro quam necessitatibus nobis velit debitis. <span itemprop="itemNum">56789</span> labore eos nobis earum fuga tempore officiis excepturi rerum placeat. Perferendis, earum officiis veniam dicta eius aliquid, similique porro quam necessitatibus nobis velit debitis.

I need to get the text inside the span that has an itemprop labeled itemNum.

I tried this but it did not work for me:

/\b(itemprop=\"sku\"")\b/g

Ultimately I would have only 56789 in a variable.

Thank you all in advance.

10
  • 3
    parsing html with a reg exp is not the best thing to do. Commented Jan 26, 2019 at 17:22
  • Why not just get innerHtml of the span element? Commented Jan 26, 2019 at 17:22
  • Because the string is not html per say. it is inside an object that gets created dynamically Commented Jan 26, 2019 at 17:23
  • @epascarello any ideas how to avoid that? As I mentioned above this complete string which contains html in it gets created dynamically Commented Jan 26, 2019 at 17:24
  • Can you get the html from that object? And is it just 1 span? Commented Jan 26, 2019 at 17:27

4 Answers 4

4

One approach to reach the goal of getting the value if you don't necessarily have to use regex would be to use DOMParser to first parse the string, then get the element using e.g querySelect:

const str = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit. Voluptas architecto dicta amet cumque, atque, labore eos nobis earum fuga tempore officiis excepturi rerum placeat. Perferendis, earum officiis veniam dicta eius aliquid, similique porro quam necessitatibus nobis velit debitis. <span itemprop="itemNum">56789</span> labore eos nobis earum fuga tempore officiis excepturi rerum placeat. Perferendis, earum officiis veniam dicta eius aliquid, similique porro quam necessitatibus nobis velit debitis.';

const parser = new DOMParser();
const doc = parser.parseFromString(str, "text/html");
console.log(doc.querySelector('span[itemprop="itemNum"]').innerHTML)

Sign up to request clarification or add additional context in comments.

3 Comments

Mathias would this affect the real DOM in any way? I am not familiar with DOMParser.
@Sergio It does not affect the "real DOM" in any way, it essentially creates a new DOM (document) in memory that you can use however you want
sweet! I will give this a go.
1

Based on https://stackoverflow.com/a/14210948/3999647 just updated the regex and input

function getMatches(string, regex, index) {
  index || (index = 1); // default to the first capturing group
  var matches = [];
  var match;
  while (match = regex.exec(string)) {
    matches.push(match[index]);
  }
  return matches;
}


// Example :
var myString = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit. Voluptas architecto dicta amet cumque, atque, labore eos nobis earum fuga tempore officiis excepturi rerum placeat. Perferendis, earum officiis veniam dicta eius aliquid, similique porro quam necessitatibus nobis velit debitis. <span itemprop="itemNum">56789</span> labore eos nobis earum fuga tempore officiis excepturi rerum placeat. Perferendis, earum officiis veniam dicta eius aliquid, similique porro quam necessitatibus nobis velit debitis.';
var myRegEx = /(<span itemprop="\w+">)(\d+)(<\/span>)/g;

// Get an array containing the first capturing group for every match
var matches = getMatches(myString, myRegEx, 2);

// Log results
document.write(matches.length + ' matches found: ' + JSON.stringify(matches))
console.log(matches);

Comments

0

One probable solution.

let str = `Lorem ipsum dolor sit amet, consectetur adipisicing elit. Voluptas architecto dicta amet cumque, atque, labore eos nobis earum fuga tempore officiis excepturi rerum placeat. Perferendis, earum officiis veniam dicta eius aliquid, similique porro quam necessitatibus nobis velit debitis. <span itemprop="itemNum">56789</span> labore eos nobis earum fuga tempore officiis excepturi rerum placeat. Perferendis, earum officiis veniam dicta eius aliquid, similique porro quam necessitatibus nobis velit debitis.`

let op = str.match(/<[^>]+>([^<]+)<\/[^>]+>/g).map(e=>e.replace(/.*?>(.*)<.*/, "$1"))

console.log(op)

Comments

-1

Using regex lookbehind for itemprop="itemNum"> and lookahead for </ then just capture whatever is between.

const data = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit. Voluptas architecto dicta amet cumque, atque, labore eos nobis earum fuga tempore officiis excepturi rerum placeat. Perferendis, earum officiis veniam dicta eius aliquid, similique porro quam necessitatibus nobis velit debitis. <span itemprop="itemNum">56789</span> labore eos nobis earum fuga tempore officiis excepturi rerum placeat. Perferendis, earum officiis veniam dicta eius aliquid, similique porro quam necessitatibus nobis velit debitis.'

const res = data
.match(/(?<=itemprop\="itemNum"\>).+(?=\<\/)/)
//returns an array... get first value
.shift();

console.log(res);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.