0

I have a text file that looks like this:

1)  bla bla bla 

 bla bla bl- 

a bla bla 

 2)  bla bla bl- 

a bla bla bl- 

  a bla bla 


3)  bla bla bl- 

a bla bla bl- 

a bla bla 

I want to take every list item and put it inside a

<p class="bla"></p> 

html tag. I also want to fuse the words that are broken up into syllables.

I only managed to get the begining of the list item

^[ ]+[0-9]+\)

and words that end with the minus sign

[a-zA-ZäöüßÄÖÜ]+\-

I wish to do this in JavaScript, but if it can be done in notepad++ too better still.

Thanks

3
  • what do you mean by a list item Commented Apr 1, 2013 at 16:24
  • If you aren't concerned with a complete solution you can do a replace on your numbers with </p><p class="bla"> and then fix the first and last ones. You can also combine the syllabled lines in notepad++ by doing a replace for -\x0d\x0a (dash and then \r\n effectively) using multiline replace Commented Apr 1, 2013 at 16:26
  • for example a list item would be all the characters that begin with 1) (included) and end at the beginning of 2) (not included) and so on... Commented Apr 1, 2013 at 16:40

1 Answer 1

1

I want to take every list item and put it inside a html tag

I would have recommended a <li> tag :-) What you want is the string that begins with [0-9]+\), is preceded by linebreaks or the file begin and is followed by either linebreaks and the next point, or the file end. This regex should do that:

(^|\s*\n\s*)\d+\)([\s\S]+?)(?=\s*$|\s*\n\s*\d+\))

Now you can replace it with $1<p class="bla">$2</p>. You might want to exclude some of the whitespaces from the matching groups to remove them.


I want to fuse the words that are broken up into syllables.

For that, we can match a word end, followed by the minus sign and linebreaks:

\b-\s*\n\s*

Then replace that with the empty string.

Sign up to request clarification or add additional context in comments.

3 Comments

I can't get it working. How am I suppose to use $1<p class="bla">$2</p> ?
In JS: result = string.replace(/(^|\s*\n\s*)\d+\)([\s\S]+?)(?=\s*$|\s*\n\s*\d+\))/g, '$1<p class="bla">$2</p>');. I think $1 and co do reference the groups in Notepad++ as well.
I was using string.replace(/(^|\s*\n\s*)\d+)([\s\S]+?)(?=\s*$|\s*\n\s*\d+))/g, "$1<p class="bla">$2</p>"); and it gave SyntaxError: Unexpected identifier. That was the problem. Thanks Bergi

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.