Extract portion of text with regex

Question

I would like to extract lines in between '*Node\n' and '*Element, type=S4R\n' from the following text using regex.

text ="""**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
**"""

I have tried re.findall(r"\*Node\s([\s\S]+)\*\w", text) and re.findall(r"(?<=\*Node\s)([\s\S]+)(?=\*)", text) but not able to filter out the end portion of the text. I'm getting output:

['      1,         0.25,          0.5,         0.75\n      2,         0.25,           0.,         0.75\n   1416,  0.200000003,           0., 0.0500000007\n*Element, type=S4R\n 1,   1,  21, 357,  46\n 2,  21,  22, 358, 357\n*Nset, nset=_PickedSet24, internal, generate\n    1,  1416,     1\n*']

However, if I try re.findall(r"(?<=name\s)([\s\S]+)(?=\selon)", text1) & re.findall(r"name\s([\s\S]+)\selon", text1) for the following code, I do get ['isn,t'] as desired.

text1 = """my name isn,t\nelon *nestla"""

EDIT full text is the following, there are multiple such patches to extract and I can end the patches with *Element always

text = """** PARTS
**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
*End Part
**  
*Part, name=plate#Part-1
*Node
      1, -0.449999988, -0.477499992,           0.
      2, -0.400000006, -0.477499992,           0.
    121, 0.0500000007, 0.0225000009,           0.
*Nset, nset=_PickedSet2, internal, generate
   1,  121,    1
*End Part
**  
**"""

Why not like this? \*Node\r?\n([\s\S]+)\r?\n\*Element, type=S4R regex101.com/r/ilpWYk/1 — The fourth bird
– The fourth bird, Commented Oct 18, 2020 at 8:33
You could get multiple matches like this regex = r"^\*Node\r?\n((?:(?!\*\w).*\r?\n)*)\*\w.*" and then use re.findall. See ideone.com/DFel7r — The fourth bird
– The fourth bird, Commented Oct 18, 2020 at 8:56

The fourth bird · Accepted Answer · 2020-10-18 09:02:17Z

2

You could be more specific and add the newlines and match \*Element, type=S4R after it.

\*Node\r?\n([\s\S]+?)\r?\n\*Element, type=S4R

Regex demo

Without unnecessary backtracking you could also start the match with *Node and match all lines that do not start with *Element using a negative lookahead.

^\*Node\r?\n((?:(?!\*Element).*\r?\n)*)\*Element, type=S4R

Regex demo | Python demo

import re

regex = r"^\*Node\r?\n((?:(?!\*Element).*\r?\n)*)\*Element, type=S4R"
text = ("**\n"
    "*Part, name=Part-2\n"
    "*Node\n"
    "      1,         0.25,          0.5,         0.75\n"
    "      2,         0.25,           0.,         0.75\n"
    "   1416,  0.200000003,           0., 0.0500000007\n"
    "*Element, type=S4R\n"
    " 1,   1,  21, 357,  46\n"
    " 2,  21,  22, 358, 357\n"
    "*Nset, nset=_PickedSet24, internal, generate\n"
    "    1,  1416,     1\n"
    "**")

matches = re.search(regex, text, re.MULTILINE)
if matches:
    print(matches.group(1))

Output

      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007

If you want to find all the matches, you could also use re.findall and end the match with *, a word character \w and match the rest of the line using .*

import re
 
regex = r"^\*Node\r?\n((?:(?!\*\w).*\r?\n)*)\*\w.*"
text = """** PARTS
**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
*End Part
**  
*Part, name=plate#Part-1
*Node
      1, -0.449999988, -0.477499992,           0.
      2, -0.400000006, -0.477499992,           0.
    121, 0.0500000007, 0.0225000009,           0.
*Nset, nset=_PickedSet2, internal, generate
   1,  121,    1
*End Part
**  
**""" 
 
print(re.findall(regex, text, re.MULTILINE))

Output

['      1,         0.25,          0.5,         0.75\n      2,         0.25,           0.,         0.75\n   1416,  0.200000003,           0., 0.0500000007\n', '      1, -0.449999988, -0.477499992,           0.\n      2, -0.400000006, -0.477499992,           0.\n    121, 0.0500000007, 0.0225000009,           0.\n']

edited Oct 18, 2020 at 9:02

answered Oct 18, 2020 at 8:37

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

hovedguy Over a year ago

Thanks for your reply, please see, I have updated the full text.

The fourth bird Over a year ago

@hovedguy I have updated the answer. Should it always start with *Node and end with *? How many matches do you expect?

hovedguy Over a year ago

Yes, that's right. There are 3 to 4 matches. Thanks again :)

The fourth bird Over a year ago

@hovedguy Like this? ^\*\w.*\r?\n((?:(?!\*\w).*\r?\n)*)(?=\*\w) See regex101.com/r/Q9F05G/1

hovedguy Over a year ago

It works fine, the best part is the interactive regex link you shared. It is very helpful and now I can try out other stuff there. Thanks !!

Andrej Kesely · Accepted Answer · 2020-10-18 08:32:53Z

2

import re

text ="""**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
**"""

print( re.search(r'^\*Node(.*?)^\*Element, type=S4R', text, flags=re.S|re.M).group(1) )

Prints:

      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007

answered Oct 18, 2020 at 8:32

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

1 Comment

hovedguy Over a year ago

Thanks for your reply, please see, I have updated the full text.

Collectives™ on Stack Overflow

Extract portion of text with regex

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related