0

I want to split the string into separate list based on the pattern. Let say I have the string look like this

string = '1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1 1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0 1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2' 

The pattern should recognize the 1. as the start, and end before another 1.

[1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1]
[1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0]
[1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2]

I do something like this

lists=[]
reg = '^\\1\\.'
for i in string :
  re.match(reg, i)
    lists.extend[i]
6
  • 1
    string.split(' 1.') Commented May 4, 2021 at 2:18
  • What's going wrong? Any errors? Unexpected output? Commented May 4, 2021 at 2:18
  • @JohnGordon if I did split, the 1. in the second list is missing. I want to include them also. Commented May 4, 2021 at 2:21
  • How is the second list missing? Everything should still be found... Commented May 4, 2021 at 2:27
  • ['1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 1-1', 'c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 1-0', 'b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5 1/2-1/2'] this is the result if i run the code. maybe because it is splitting by 1. and everytime it find 1. it will take the next value Commented May 4, 2021 at 2:33

2 Answers 2

1

I would use re.findall here with the pattern .*?\s+\d+(?:/\d+)?-\d+(?:/\d+)?:

string = '1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1 1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0 1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2'
parts = re.findall(r'(.*?\s+\d+(?:/\d+)?-\d+(?:/\d+)?)\s*', string)
print(parts)

This prints:

['1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1',
 '1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0',
 '1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2']

Here is a brief explanation of the regex pattern used:

(                 match and capture
    .*?           all content up to the nearest
    \s+           whitespace
    \d+(?:/\d+)?  followed by e.g. 1 or 1/2
    -             dash
    \d+(?:/\d+)?  another 1 or 1/2
)                 stop capture
\s*               match, but do not capture, optional whitespace
Sign up to request clarification or add additional context in comments.

1 Comment

Can you tell me the regex parts please?
0
string = '1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8   1-1 1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5  1-0 1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5  1/2-1/2'

print([['1.'+i] for i in (" "+string).split(' 1.')][1:])

This works fine too!

2 Comments

I tried it but it missing the first index of the string, modified the [1:] to be [0:] however it looks like this , the 0 index has another 1. [['1.1.e4 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 1-1'], ['1.c2 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 1-0'], ['1.b5 d6 2.d4 Nf6 3.Nc3 g6 4.Nf3 Bg7 5.Be2 Nbd7 6.O-O O-O 7.e5 dxe5 8.dxe5 Ng4 9.e6 Nde5 10.Qxd8 Rxd8 11.Nxe5 Nxe5 12.Nb5 1/2-1/2']]
Oh, thanks for pointing it out. An extra space is supposed to be added to the original string. I have edited the answer, you can check it now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.