2

This is my first post. I always come to this forum looking for an answer when it comes to code.

I have been fighting with understanding regular expressions in Python, but it is kind of hard.

I have text that looks like this:

Name:   Clash1
Distance:   -1.341m
Image Location: Test 1_navis_files\cd000001.jpg
HardStatus: New
Clash Point:    3.884m, -2.474m, 2.659m
Date Created:   2016/6/2422:45:09

Item 1
GUID:   6efaec51-b699-4d5a-b947-505a69c31d52
Path:   File ->Colisiones_v2015.dwfx ->Segment ->Pipes (1) ->Pipe Types (1) ->Default (1) ->Pipe Types [2463] ->Shell
Item Name:  Pipe Types [2463]
Item Type:  Shell

Item 2
GUID:   6efaec51-b699-4d5a-b947-505a69c31dea
Path:   File ->Colisiones_v2015.dwfx ->Segment ->Walls (4) ->Basic Wall (4) ->Wall 1 (4) ->Basic Wall [2343] ->Shell
Item Name:  Basic Wall [2343]
Item Type:  Shell

------------------


Name:   Clash2
Distance:   -1.341m
Image Location: Test 1_navis_files\cd000002.jpg
HardStatus: New
Clash Point:    3.884m, 3.533m, 2.659m
Date Created:   2016/6/2422:45:09

Item 1
GUID:   6efaec51-b699-4d5a-b947-505a69c31d52
Path:   File ->Colisiones_v2015.dwfx ->Segment ->Pipes (1) ->Pipe Types (1) ->Default (1) ->Pipe Types [2463] ->Shell
Item Name:  Pipe Types [2463]
Item Type:  Shell

Item 2
GUID:   6efaec51-b699-4d5a-b947-505a69c31de8
Path:   File ->Colisiones_v2015.dwfx ->Segment ->Walls (4) ->Basic Wall (4) ->Wall 1 (4) ->Basic Wall [2341] ->Shell
Item Name:  Basic Wall [2341]
Item Type:  Shell

------------------

What I need to do is to create a list that extracts for every chunk of text (separated by the -------------------------------) the following things as a string: the clash name and the clash point.

For example: Clash 1 3.884, 3.533, 2.659

I am really new to Python, and really do not have much understanding about regular expressions.

Can anyone give me some clues about using regex to extract this values from the text?

I did something like this:

exp = r'(?<=Clash Point\s)(?<=Point\s)([0-9]*)'
match = re.findall(exp, html)

if match:
    OUT.append(match)
else:
    OUT = 'fail'

but I know I am far from my goal.

2 Answers 2

1

If you're looking for a regex solution, you could come up with:

^Name:\s*         # look for Name:, followed by whitespaces
                  # at the beginning of a line
(?P<name>.+)      # capture the rest of the line
                  # in a group called "name"
[\s\S]+?          # anything afterwards lazily
^Clash\ Point:\s* # same construct as above
(?P<point>.+)     # same as the other group

See a demo on regex101.com.


Translated into Python code, this would be:

import re
rx = re.compile(r"""
                ^Name:\s*
                (?P<name>.+)
                [\s\S]+?
                ^Clash\ Point:\s*
                (?P<point>.+)""", re.VERBOSE|re.MULTILINE)

for match in rx.finditer(your_string_here):
    print match.group('name')
    print match.group('point')

This will output:

Clash1
3.884m, -2.474m, 2.659m
Clash2
3.884m, 3.533m, 2.659m

See a working demo on ideone.com.

Sign up to request clarification or add additional context in comments.

Comments

0
import re


lines = s.split('\n')

names = []
points = []

for line in lines:    
    result = re.search('^Name:\s*(\w+)', line)
    if result:
        names.append(result.group(1))

    result = re.search('^Clash Point:\s*([-0-9m., ]+)',line)
    if result:
        points.append(result.group(1))

print(names)
print(points)

# if you need more nice output, you can use zip() function
for name, point in zip(names, points):
    print(name, point)

You can find useful information about regular expressions at regexr.com. Also, i use it for quick tests and reference.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.