You need a regex pattern for that. I use http://www.regex101.com to play around with regexes.
You can use that pattern to extract things and replace things like so:
import re
text = 'The statement [link] http://www.washingtontimes.com/news/2017/sep/9/rob-ranco-texas-lawyer-says-he-would-be-ok-if-bets/ [/link] The Washington Times'
# get what what matched
for mat in re.findall(r"\[link\](.*?)\[/link\]",text):
print(mat)
# replace a match with sthm other
print( re.sub(r"\[link\](.*?)\[/link\]","[URL]",text))
Output:
http://www.washingtontimes.com/news/2017/sep/9/rob-ranco-texas-lawyer-says-he-would-be-ok-if-bets/
The statement [URL] The Washington Times
The pattern I use is non-greedy, so it wont match multiple [link][/link] parts if they occure in one sentence but only the shortest ones:
\[link\](.*?)\[/link\] - matches a literal [ followed by link followed by literal ]
with as few things before matching the endtag [/link]
Without non-greedy matches you only get one replace for the whole of
The statement [link] http://www.washingtontimes.com/news/2017/sep/9/rob-ranco-texas-lawyer-says-he-would-be-ok-if-bets/ [/link] and this also [link] http://www.washingtontimes.com/news/2017/sep/9/rob-ranco-texas-lawyer-says-he-would-be-ok-if-bets/ [/link] The Washington Times
instead of two.
find all links:
import re
text = """
The statement [link] link 1 [/link] and [link] link 2 [/link] The Washington Times
The statement [link] link 3 [/link] and [link] link 4 [/link] The Washington Times
"""
# get what what matched
links = re.findall(r"\[link\](.*)\[/link\]",text) # greedy pattern
links_lazy = re.findall(r"\[link\](.*?)\[/link\]",text) # lazy pattern
Output:
# greedy
[' link 1 [/link] and [link] link 2 ',
' link 3 [/link] and [link] link 4 ']
# lazy
[' link 1 ', ' link 2 ', ' link 3 ', ' link 4 ']
The difference is visible if you do not include newlines in the text-to-match - the (*.) does not match newlines - so if you have multiple links in a sentence you need a (.*?) match to get both as single match instead of getting the whole part matched.
re, explained below.