0

I would like to extract lines in between '*Node\n' and '*Element, type=S4R\n' from the following text using regex.

text ="""**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
**"""

I have tried re.findall(r"\*Node\s([\s\S]+)\*\w", text) and re.findall(r"(?<=\*Node\s)([\s\S]+)(?=\*)", text) but not able to filter out the end portion of the text. I'm getting output:

['      1,         0.25,          0.5,         0.75\n      2,         0.25,           0.,         0.75\n   1416,  0.200000003,           0., 0.0500000007\n*Element, type=S4R\n 1,   1,  21, 357,  46\n 2,  21,  22, 358, 357\n*Nset, nset=_PickedSet24, internal, generate\n    1,  1416,     1\n*']

However, if I try re.findall(r"(?<=name\s)([\s\S]+)(?=\selon)", text1) & re.findall(r"name\s([\s\S]+)\selon", text1) for the following code, I do get ['isn,t'] as desired.

text1 = """my name isn,t\nelon *nestla"""

EDIT full text is the following, there are multiple such patches to extract and I can end the patches with *Element always

text = """** PARTS
**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
*End Part
**  
*Part, name=plate#Part-1
*Node
      1, -0.449999988, -0.477499992,           0.
      2, -0.400000006, -0.477499992,           0.
    121, 0.0500000007, 0.0225000009,           0.
*Nset, nset=_PickedSet2, internal, generate
   1,  121,    1
*End Part
**  
**""" 
2
  • 1
    Why not like this? \*Node\r?\n([\s\S]+)\r?\n\*Element, type=S4R regex101.com/r/ilpWYk/1 Commented Oct 18, 2020 at 8:33
  • 1
    You could get multiple matches like this regex = r"^\*Node\r?\n((?:(?!\*\w).*\r?\n)*)\*\w.*" and then use re.findall. See ideone.com/DFel7r Commented Oct 18, 2020 at 8:56

2 Answers 2

2

You could be more specific and add the newlines and match \*Element, type=S4R after it.

\*Node\r?\n([\s\S]+?)\r?\n\*Element, type=S4R

Regex demo

Without unnecessary backtracking you could also start the match with *Node and match all lines that do not start with *Element using a negative lookahead.

^\*Node\r?\n((?:(?!\*Element).*\r?\n)*)\*Element, type=S4R

Regex demo | Python demo

import re

regex = r"^\*Node\r?\n((?:(?!\*Element).*\r?\n)*)\*Element, type=S4R"
text = ("**\n"
    "*Part, name=Part-2\n"
    "*Node\n"
    "      1,         0.25,          0.5,         0.75\n"
    "      2,         0.25,           0.,         0.75\n"
    "   1416,  0.200000003,           0., 0.0500000007\n"
    "*Element, type=S4R\n"
    " 1,   1,  21, 357,  46\n"
    " 2,  21,  22, 358, 357\n"
    "*Nset, nset=_PickedSet24, internal, generate\n"
    "    1,  1416,     1\n"
    "**")

matches = re.search(regex, text, re.MULTILINE)
if matches:
    print(matches.group(1))

Output

      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007

If you want to find all the matches, you could also use re.findall and end the match with *, a word character \w and match the rest of the line using .*

import re
 
regex = r"^\*Node\r?\n((?:(?!\*\w).*\r?\n)*)\*\w.*"
text = """** PARTS
**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
*End Part
**  
*Part, name=plate#Part-1
*Node
      1, -0.449999988, -0.477499992,           0.
      2, -0.400000006, -0.477499992,           0.
    121, 0.0500000007, 0.0225000009,           0.
*Nset, nset=_PickedSet2, internal, generate
   1,  121,    1
*End Part
**  
**""" 
 
print(re.findall(regex, text, re.MULTILINE))

Output

['      1,         0.25,          0.5,         0.75\n      2,         0.25,           0.,         0.75\n   1416,  0.200000003,           0., 0.0500000007\n', '      1, -0.449999988, -0.477499992,           0.\n      2, -0.400000006, -0.477499992,           0.\n    121, 0.0500000007, 0.0225000009,           0.\n']
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for your reply, please see, I have updated the full text.
@hovedguy I have updated the answer. Should it always start with *Node and end with *? How many matches do you expect?
Yes, that's right. There are 3 to 4 matches. Thanks again :)
@hovedguy Like this? ^\*\w.*\r?\n((?:(?!\*\w).*\r?\n)*)(?=\*\w) See regex101.com/r/Q9F05G/1
It works fine, the best part is the interactive regex link you shared. It is very helpful and now I can try out other stuff there. Thanks !!
2
import re

text ="""**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
**"""

print( re.search(r'^\*Node(.*?)^\*Element, type=S4R', text, flags=re.S|re.M).group(1) )

Prints:

      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007

1 Comment

Thanks for your reply, please see, I have updated the full text.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.