Split string with pattern

Question

I want to split the string into separate list based on the pattern. Let say I have the string look like this

string = '1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1 1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0 1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2'

The pattern should recognize the 1. as the start, and end before another 1.

[1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1]
[1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0]
[1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2]

I do something like this

lists=[]
reg = '^\\1\\.'
for i in string :
  re.match(reg, i)
    lists.extend[i]

@JohnGordon if I did split, the 1. in the second list is missing. I want to include them also. — Trojan666
– Trojan666, Commented May 4, 2021 at 2:21
How is the second list missing? Everything should still be found... — 12944qwerty
– 12944qwerty, Commented May 4, 2021 at 2:27
['1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 1-1', 'c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 1-0', 'b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5 1/2-1/2'] this is the result if i run the code. maybe because it is splitting by 1. and everytime it find 1. it will take the next value — Trojan666
– Trojan666, Commented May 4, 2021 at 2:33

Tim Biegeleisen · Accepted Answer · 2021-05-04 02:37:14Z

1

I would use re.findall here with the pattern .*?\s+\d+(?:/\d+)?-\d+(?:/\d+)?:

string = '1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1 1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0 1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2'
parts = re.findall(r'(.*?\s+\d+(?:/\d+)?-\d+(?:/\d+)?)\s*', string)
print(parts)

This prints:

['1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1',
 '1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0',
 '1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2']

Here is a brief explanation of the regex pattern used:

(                 match and capture
    .*?           all content up to the nearest
    \s+           whitespace
    \d+(?:/\d+)?  followed by e.g. 1 or 1/2
    -             dash
    \d+(?:/\d+)?  another 1 or 1/2
)                 stop capture
\s*               match, but do not capture, optional whitespace

edited May 4, 2021 at 2:37

answered May 4, 2021 at 2:31

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Trojan666 Over a year ago

Can you tell me the regex parts please?

edusanketdk · Accepted Answer · 2021-05-04 08:17:02Z

0

string = '1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1 1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0 1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2'

print([['1.'+i] for i in (" "+string).split(' 1.')][1:])

This works fine too!

edited May 4, 2021 at 8:17

answered May 4, 2021 at 2:35

edusanketdk

6021 gold badge6 silver badges11 bronze badges

2 Comments

Trojan666 Over a year ago

I tried it but it missing the first index of the string, modified the [1:] to be [0:] however it looks like this , the 0 index has another 1.

[['1.1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1'],  ['1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0'],  ['1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2']]

edusanketdk Over a year ago

Oh, thanks for pointing it out. An extra space is supposed to be added to the original string. I have edited the answer, you can check it now!

Collectives™ on Stack Overflow

Split string with pattern

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related