Regex to SQL: repetition-operator operand invalid

Question

I'm trying to use a regex to detect URLs in all the rows of my table, here's the regex

\b(([\w-]+:\/\/?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|\/)))

However, I invariably get the "repetition-operator operand invalid" error, which, after hours of search on the internet, still remains obscure. Where have I gone wrong? What can I do to fix this? And alternaltively, is there a better way to detect URLs in messages in SQL other than a Regex?

Thank you.

What about Regexp to validate URL in MySQL? Does it answer your question? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented May 25, 2015 at 10:49
@stribizhev I might need a more complex regex, otherwise not all URLs would be detected. The one above worked for me perfectly, and if I can help it, I would like to keep working with it. — Unforgiven
– Unforgiven, Commented May 25, 2015 at 10:53
@stribizhev Getting the Repetition error for this one. As for the regex you suggested, I tried it, and it only detects a small portions of my URLs. — Unforgiven
– Unforgiven, Commented May 25, 2015 at 11:00
Looks like the problem is with using ? quantifier, *? and +? cannot be used in MySQL regexp. Also, not sure about non-capturing groups, replace them with capturing ones. Please re-try with [[:<:]](([a-zA-Z0-9-]+:\/\/*|www[.])[^ ()<>]+(\([a-zA-Z0-9_]+\)|([^ [:punct:]]|\/))). — Wiktor Stribiżew
– Wiktor Stribiżew, Commented May 25, 2015 at 11:07
@stribizhev It works this. But the results improved just slightly. Are you sure that the regex hasn't changed at all? (This new syntax boggles me) — Unforgiven
– Unforgiven, Commented May 25, 2015 at 11:13

Wiktor Stribiżew · Accepted Answer · 2015-05-25 11:27:09Z

1

You cannot use ? quantifier in MySQL regex as the syntax is POSIX-based. Still, you can use * to match 0 or more characters. Also, \b in MySQL regex should be replaced with [[:<:]] (since this matches at the beginning of a word).

Thus, I suggest using

[[:<:]](([a-zA-Z0-9-]+:\/\/*|www[.])[^ ()<>]+(\([a-zA-Z0-9_]+\)|([^ [:punct:]]|\/)))

I am expanding \w to [a-zA-Z0-9_] as it is exactly what \w is. Instead of \s, I am using a literal space. Instead of \d, I am using [0-9]. This is done for readability and better compatibility. If \w, \d and \s work for you, you can use them, but I do not see them among the supported entities in POSIX specs.

Also, instead of literal space, you could use [:space:], it matches space, tab, newline, and carriage return. Instead of [a-zA-Z] you can use [:alpha:], and instead of [0-9], you can use [:digit:]. Please also check this:

[[:<:]](([[:alpha:][:digit:]-]+:\/\/*|www[.])[^[:space:]()<>]+(\([[:alpha:][:digit:]_]+\)|([^[:space:][:punct:]]|\/)))

answered May 25, 2015 at 11:27

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Unforgiven Over a year ago

It's working flawlessly this time. I'll make sure to read thoroughly through your answer so as to not get stuck twice. Thank you very much. And good day to you.

Wiktor Stribiżew Over a year ago

You are welcome. BTW, I have just found an URL matching regex here and tried to adapt it for you: [[:<:]](https*://)*[a-zA-Z0-9_]+\.(\.*[a-zA-Z0-9_]+)+. No idea if it works better.

Collectives™ on Stack Overflow

Regex to SQL: repetition-operator operand invalid

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related