1

I am defining a function in Python3 to manipulate a string with regular expressions.

I have problems finding the regular expression to extract part of the string. Consider the following input strings

str1 = "http://99.199.9.90:22/some/path/here/id_type_51549851/read"
str2 = "http://99.199.9.90:22/some/path/here/myid_31654/read"

For the above strings I would like to obtain as output the following strings:

output_str1: "http://99.199.9.90:22/some/path/here/id_type_/read"
output_str2: "http://99.199.9.90:22/some/path/here/myid_/read"

The final underscore in the output string is not mandatory.

To be more general it would be better to have it working also with the following string (if possible):

str3 =  "http://99.199.9.90:22/some/path/here/myid_alphaBeta/read"

outputting

"http://99.199.9.90:22/some/path/here/myid_/read"

Note that IP, port, paths are invented but the structure is like this.

I want to eliminate from the string part before read and after the last underscore considering the fact that there might be another underscore before.

So basically my output should contain the first part of the original string the final part and match a central part that is not part of the output. Putting it other words it should cut a central matching part of the string

I am starting from the regular expression outputting the whole string:

"(.+?)/some/path/here/(.+?)/read"

I tried something like (.+?)/some/path/here/(.+?)_[.+?]/read

but it did not work.

The function up now is this (the part to be :

def cutURL(str):
    res = str
    if (bool(re.search("(.+?)&someMatch=[0-9]+", str))):
        res = re.search("(.+?)&someMatch=[0-9]+", str).group()
    elif (bool(re.search("(.+?)/devices/(.+?)/read", str))):
        res = re.search("(.+?)/some/path/here/(.+?)/read", str)
    return res
4
  • inside square brackets [.+?] the characters are interpreted literal - not with theire usual (one or more but lazy anythings) Commented Dec 18, 2018 at 10:26
  • 1
    Try re.sub(r'(/some/path/here/[^/]*_)[^/_]*(/read)', r'\1\2', s), see regex101.com/r/VBzHuS/1 Commented Dec 18, 2018 at 10:27
  • Then how to escape them? Commented Dec 18, 2018 at 10:27
  • Thanks it works. Is r saying copy all the substring before? Could you exaplain a little further the meaning of [^/]*_)[^/_]* (maybe in an answer)? Commented Dec 18, 2018 at 10:33

2 Answers 2

1

Use this

str2 = "http://99.199.9.90:22/some/path/here/myid_31654/read"
str2 = re.sub("myid_[0-9]+","myid_",str2)

For documentation of the sub method and more applications refer to the docs

Sign up to request clarification or add additional context in comments.

Comments

0

From the examples above, you could substitute

_\w+/read$

with

_/read

See a demo on regex101.com.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.