2

I am trying to match a set of strings that follow a certain pattern using re. However, it fails at some point.

Here are the strings that fails

string1= "\".A.B.C.D.E.F.G.H.I.J\""
string2= "\".K.Y.C.A.N.Y.W.H.I.A.W...1.B...1.1.7\""
string3= "\"Testing using quotes func \"quot\".\"": 
string4= "A.b.e.f. testing test": 

Here is my approach:

"".join(re.findall("\.(\w+)", string1))

Here are my expectations:

"ABCDEFGHIJ"
"KYCANYWHIAW.1B.117"
"Testing using quotes func \"quot\"."
"A.b.e.f. testing test"

It only works for the first string

4
  • Maybe re.sub(r'(\.)*\.([A-Z0-9])', r'\1\2', string1)? Commented Jan 17, 2021 at 16:21
  • are you only trying to match or do you want to substitute? (string1 and string2) are missing dots in your expectation? What is your pattern? A . followed by anything not a whitespace-character (matching string3 like re.sub(r'\.([^.])', r'\g<1>', string1)) or followed by something like [a-zA-Z0-9] as @WiktorStribiżew wrote? Commented Jan 17, 2021 at 16:22
  • What are your rules for substitution? From your examples it seems to me that the rules include: (1) If the sequence of chars is proceeded and followed by \ and if the . is followed by a Capital Letter, remove the dot, else ignore the dot. Is this correct? Commented Jan 17, 2021 at 16:27
  • Also, your string3 and string4 definitions have trailing : characters that should be removed. Commented Jan 17, 2021 at 16:33

1 Answer 1

2

For the given examples, one option is to remove the dots while asserting what is directly to the right is either an optional dot followed by a char A-Z or a digit 0-9.

Note that \w would also match a-z.

\.(?=\.?[A-Z0-9])

Explanation

  • \. Match a dot
  • (?= Positive lookahead, assert what is directly to the right is
    • \.?[A-Z0-9] Optionally match a dot and a char A-Z or digit 0-9
  • ) Close lookahead

Regex demo | Python demo

Example code

import re

strings = [
    "\".A.B.C.D.E.F.G.H.I.J\"",
    "\".K.Y.C.A.N.Y.W.H.I.A.W...1.B...1.1.7\"",
    "\"Testing using quotes func \"quot\".\"",
    "A.b.e.f. testing test"
]

for s in strings:
    print(re.sub(r"\.(?=\.?[A-Z0-9])", '', s))

Output

"ABCDEFGHIJ"
"KYCANYWHIAW.1B.117"
"Testing using quotes func "quot"."
A.b.e.f. testing test

Another option could be specify the different rules for the pattern matching an alternation. For example using multiple occurrences of the dot and leaving a single one between W.1 and B.1:

(?<!\d)\.+(?=[A-Z.])|(?<=\d)\.+(?=[A-Z\d])

Regex demo

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.