0

Let's say I have this string

'1. A4  1... d5  2. c4  2... Yf6  3. NP3  3... dxc4  4. BO3  4... BK4  5. e3  5... Bf3  6. Q3  6... e6  7. Bc4  7... B4  8. O-O  8... B3  9. b3  9... O-O  10. B3  10... Re8  11. Q7  1-0'

I want to remove the numbers not attached to letters and if you scroll to the end I want the 1-0 removed as well, so something like this

['A4', 'd5', ..., 'O-O', ..., 'Q7']

So I tried this,

re.findall(r'(?:[^\W\d_]+\d|\d+[^\W\d_])[^\W_]*|[^\W\d_]+', text)

but got this,

['A4', 'd5', ..., 'O', 'O', ..., 'Q7']

So it is removing the - for 1-0 which I want, but also to O-O.

3
  • What do you mean by "it is removing the - for 1-0"? Isn't it removing all of 1-0? Commented May 24, 2021 at 16:25
  • 1
    Please don't make more work for other people by vandalizing your posts. By posting on the Stack Exchange network, you've granted a non-revocable right, under the CC BY-SA 4.0 license, for Stack Exchange to distribute that content (i.e. regardless of your future choices). By Stack Exchange policy, the non-vandalized version of the post is the one which is distributed. Thus, any vandalism will be reverted. If you want to know more about deleting a post please see: How does deleting work? Commented May 24, 2021 at 16:37
  • I copy the wrong stuff, MY BAD didn't view it Commented May 24, 2021 at 17:44

3 Answers 3

1

Why bother with regular expressions here? You can just split the string and take the odd numbered indices:

>>> s = '1. A4  1... d5  2. c4  2... Yf6  3. NP3  3... dxc4  4. BO3  4... BK4  5. e3  5... Bf3  6. Q3  6... e6  7. Bc4  7... B4  8. O-O  8... B3  9. b3  9... O-O  10. B3  10... Re8  11. Q7  1-0'
>>> list(itertools.islice(s.split(), 1, None, 2))
['A4',
 'd5',
 'c4',
 'Yf6',
 'NP3',
 'dxc4',
 'BO3',
 'BK4',
 'e3',
 'Bf3',
 'Q3',
 'e6',
 'Bc4',
 'B4',
 'O-O',
 'B3',
 'b3',
 'O-O',
 'B3',
 'Re8',
 'Q7']
Sign up to request clarification or add additional context in comments.

2 Comments

the OP's requirement is remove the numbers not attached to letters, this answer works but would fail on 2 digits numbers, or numbers with unknown lengths.
@Jared I don't believe it would fail at all. There's nothing about this solution that distinguishes 1-digit from 2- or more digit numbers, or even cares about the contents of each token. The input is a list moves in a single game of chess. It seems pretty clear the poster wants to filter out the turn numbers from the moves.
1

Find all words with at least a single letter in it :

re.findall(r'\S*[a-zA-Z]\S*', text)

['A4', 'd5', 'c4', 'Yf6', 'NP3', 'dxc4', 'BO3', 'BK4', 'e3', 'Bf3', 'Q3', 'e6', 'Bc4', 'B4', 'O-O', 'B3', 'b3', 'O-O', 'B3', 'Re8', 'Q7']

Comments

0

You can use a list comprehension to iterate through the string and isidigt to check if each item contains a digit or not.

str1 = '1. A4  1... d5  2. c4  2... Yf6  3. NP3  3... dxc4  4. BO3  4... BK4  5. e3  5... Bf3  6. Q3  6... e6  7. Bc4  7... B4  8. O-O  8... B3  9. b3  9... O-O  10. B3  10... Re8  11. Q7  1-0'
output = list(filter(None, [i if i.replace("-","").isdigit()==False else None for i in str1.replace(".", "").split(' ')]))

print(output) gives:

['A4', 'd5', 'c4', 'Yf6', 'NP3', 'dxc4', 'BO3', 'BK4', 'e3', 'Bf3', 'Q3', 'e6', 'Bc4', 'B4', 'O-O', 'B3', 'b3', 'O-O', 'B3', 'Re8', 'Q7']

1 Comment

but you have the '1-0' at the end and I don't want that, all that have 'number - number' are no good

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.