I have a small Node Script which is web Scraping a Web Page. From that page I am extracting an array of Strings.
I am trying to clean up those Strings (currently with regex and string.replace)
One example String looks like this:
2 Glücklich sind die,die seine Erinnerungen beachten,+die mit ganzem Herzen nach ihm suchen.+\n
My cleaning code looks like this.
string.replace(/\+/g, '').replace(/\*/g, '').replace('\n', '').replace(/(^\d+)/g, '').trim()
The first section removes all "+", the second removes all *, the third removes the new Line and the last one removes the leading number.
The most things work fine but I have some edge cases. This is my Result:
2 Glücklich sind die,die seine Erinnerungen beachten,die mit ganzem Herzen nach ihm suchen.
Problems:
- The Leading Number was not removed (when the number has two or more digits it gets always removed, i have no Idea why a Single digit stays the same.)
- The first * got removed but because there was no whitespace there is no space anymore ;(. The second * was followed by a white space... so no Problems here.
- Same issue with the "+"... no whitespace following so the words stick together
My goal is to parse every String correctly. I have thousands of strings with different combinations but only "+", *, "\n" and the number as special characters.
The String should look like this:
Glücklich sind die, die seine Erinnerungen beachten, die mit ganzem Herzen nach ihm suchen.
Hopefully someone has an idea to accomplish that.
^\d+pattern should replace a single digit...is it possible there is leading space? Maybe try doing the.trim()first? Also if you know a+/*should always have space after being replaced, you could do this:.replace(/\s*(\+|\*)\s*/g, ' '). That way any existing spaces will be removed with the*/+and you replace it all with a single space.