Regex to extract the string

Question

I need help with regex to get the following out of the string

dal001.caxxxxx.test.com. ---> caxxxxx.test.com
caxxxx.test.com -----> caxxxx.test.com

So basically in the first example, I don't want dal001 or anything that starts with 3 letters and 3 digits and want the rest of the string if it starts with only ca.

In second example I want the whole string that starts only with ca.

So far I have tried (^[a-z]{3}[\d]+\.)?(ca.*) but it doesn't work when the string is dal001.mycaxxxx.test.com.

Any help would be appreciated.

Turn the first group into a non-capturing one, ^(?:[a-z]{3}\d{3}\.)?(ca.*), the value will be in Group 1. See regex101.com/r/mL8mkG/1 and ideone.com/hS6lz5 — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jan 22, 2021 at 14:13

Wiktor Stribiżew · Accepted Answer · 2021-01-22 14:20:05Z

2

You can use

^(?:[a-z]{3}\d{3}\.)?(ca.*)

See the regex demo. To make it case insensitive, compile with re.I (re.search(rx, s, re.I), see below).

Details:

^ - start of string
(?:[a-z]{3}\d{3}\.)? - an optional sequence of 3 letters and then 3 digits and a .
(ca.*) - Group 1: ca and the rest of the string.

See the Python demo:

import re
rx = r"^(?:[a-z]{3}\d{3}\.)?(ca.*)"
strs = ["dal001.caxxxxx.test.com","caxxxx.test.com"]
for s in strs:
  m = re.search(rx, s)
  if m:
    print( m.group(1) )

answered Jan 22, 2021 at 14:20

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

developthou Over a year ago

This is working really well. Thank you so much.

developthou Over a year ago

The only problem is when there is a non matching string it fails ``` In [13]: re.search(r"^(?:[a-z]{3}\d{3}\.)?(ca.*)",'10.9.65.35').group(1) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-13-0e801dcaecbc> in <module>() ----> 1 re.search(r"^(?:[a-z]{3}\d{3}\.)?(ca.*)",'10.9.65.35').group(1) AttributeError: 'NoneType' object has no attribute 'group' ``` Is there a way to ignore if the string doesn't match without using try ?

Wiktor Stribiżew Over a year ago

@developthou Always check if there is a match before accessing .group()s. I have shown how in the answer, you are not using my code.

developthou Over a year ago

I am using the list comprehension return [re.search(r"^(?:[a-z]{3}\d{3}\.)?(ca.*)",host).group(1) for host in hosts] which doesn't work when there is no match

developthou Over a year ago

I think in this case is not to use list comprehension I believe

|

Timur Shtatland · Accepted Answer · 2021-01-22 14:16:24Z

0

Use re.sub like so:

import re
strs = ['dal001.caxxxxx.test.com', 'caxxxx.test.com']

for s in strs:
    s = re.sub(r'^[A-Za-z]{3}\d{3}[.]', '', s)
    print(s)
# caxxxxx.test.com
# caxxxx.test.com

edited Jan 22, 2021 at 14:16

answered Jan 22, 2021 at 14:14

Timur Shtatland

12.8k3 gold badges41 silver badges68 bronze badges

1 Comment

developthou Over a year ago

Thank you but this will not extract the string if it only matches ca

Z4-tier · Accepted Answer · 2021-02-01 21:09:57Z

if you are using re:

import re
my_strings = ['dal001.caxxxxx.test.com', 'caxxxxx.test.com']
my_regex = r'^(?:[a-zA-Z]{3}[0-9]{3}\.)?(ca.*)'
compiled_regex = re.compile(r)
for a_string in my_strings:
    if compiled_regex.match(a_string):
        compiled_regex.sub(r'\1', a_string)

my_regex matches a string that starts (^ anchors to the start of the string) with [3 letters][3 digits][a .], but only optionally, and using a non-capturing group (the (?:) will not get a numbered reference to use in sub). In either case, it must then contain ca followed by anything, and this part is used as the replacement in the call to re.sub. re.compile is used to make it a bit faster, in case you have many strings to match.

Note on re.compile: Some answers don't bother pre-compiling the regex before the loop. They have made a trade: removing a single line of code, at the cost of re-compiling the regex implicitly on every iteration. If you will use a regex in a loop body, you should always compile it first. Doing so can have a major effect on the speed of a program, and there is no added cost even when the number of iterations is small. Here is a comparison of compiled vs. non-compiled versions of the same loop using the same regex for different numbers of loop iterations and number of trials. Judge for yourself.

Collectives™ on Stack Overflow

Regex to extract the string

3 Answers 3

9 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related