0

with python and regex I attempt to match repeating/overlapping patterns/blocks like

04/00127-48
U 05062012
A: SAKARK
T_ Par.: dsfsd

in

04/00127-48
U 05062012
A: SAKARK
T_ Par.: dsfsd
04/00168-42
U 05062012
A: SAKARK
T_ Par.: fdfs
04/00168-43
U 05062012
A: SAKARK
T_ Par.: fdfs

I have tried

'(?=(\d+\/.*))'

this seem to work

'((\d+\/.*?)=?\d+\/)

but is there a better approach?

3
  • 1
    I'm confused, which patterns are you trying to extract? What is your intended result? Commented Jul 7, 2012 at 18:14
  • sorry about the bad question, I want to match the text blocks Commented Jul 7, 2012 at 18:23
  • See Marco de Wit's answer. Notice his usage of the re.DOTALL flag. Commented Jul 7, 2012 at 18:31

1 Answer 1

2

This answers your question:

re.findall(r'.+?(?=\d\d\/|$)',s,re.DOTALL)

re.DOTALL is needed to let the . match end-of-lines.

The r in front of the regex makes it a raw string so escapes with backslash are left as they are so the regex function will handle them. It is not needed here but still a good habit for regex's.

Your question is not very clear. Maybe this matches better what you want?

list(zip(*[iter(s.splitlines())]*4))

It gives a list with tuples.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.