The regular expression for parsing Telegram usernames stops parsing valid usernames if there is an invalid username in the same line [duplicate]

Question

I have one problem here, I have a regexp that extracts the username in Telegram links, starting from a simple "@" to username.t.me links

The problem is that if I enter @aaaa, @jfewewf, both usernames match correctly, but when I enter @aaaa, @jfewewf_, neither username matches, even though the script should match only username @aaaa (because the username on the right side is not valid)

Here is my regex:

(?:(?<!\S)@|(?:(?:https?://|)(?:t\.me|telegram\.(?:me|dog))/(?:c/|)|tg://resolve\?domain=)|(?=^(?!.*__)(?!.*_{2,})[a-z][a-z0-9_]{3,31}(?<!_)\.t\.me$))(?P<username>(?!.*__)(?!.*_$)(?!.*_{2,})[a-z][a-z0-9_]{3,31})(?P<subdomain>\.t\.me)?

You can test it at this link: https://regex101.com/r/JFF1S0/9

Please help me 🙏🙏🙏

I've already tried almost everything, I don't know how to solve it at all.

what constitutes a valid username? could you provide the desired output for your sample data? — DuesserBaest
– DuesserBaest, Commented Oct 1, 2024 at 7:44
You expect multiple matches, BUT you are using .* in your restrictive negative lookaheads (e.g. (?!.*__)(?!.*_$)(?!.*_{2,})), so no wonder you do not get expected matches. Restrict the . inside these lookaheads to valid username chars. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 1, 2024 at 7:49
@DuesserBaest, Usernames that begin or end with an underscore, those that begin with a number, those that contain two underscores in a row, those that consist only of numbers, those that contain symbols other than Latin letters, numbers, and underscores are considered invalid. The only problem here is that due to one invalid username in one line, the valid username does not parse, as it was shown in the image — Okinea Dev
– Okinea Dev, Commented Oct 1, 2024 at 7:52
If your usernames only contain "word" chars, then you can default to \w. Here is your regex with \w. Here is BB's regex with \w. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Oct 1, 2024 at 9:13

Okinea Dev · Accepted Answer · 2024-10-01 09:18:51Z

-1

To do this, remove `(?!.*_$)` and add `(?<!_)\b` after `[a-z][a-z0-9_]{3,31}`

Here is the updated regular expression:

(?:(?<!\S)@|(?:(?:https?://|)(?:t\.me|telegram\.(?:me|dog))/(?:c/|)|tg://resolve\?domain=)|(?=^(?!.*__)(?!.*_{2,})[a-z][a-z0-9_]{3,31}(?<!_)\.t\.me$))(?P<username>(?!.*__)(?!.*_{2,})[a-z][a-z0-9_]{3,31}(?<!_)\b)(?P<subdomain>\.t\.me)?

_{https://regex101.com/r/JFF1S0/10}

Original answer from Telegram chat:

This checks to the end of the line, so it finds the "_" at the end and does not take the entire line ^[1]

try https://regex101.com/r/C6FZER/1 ^[1]

im remove you (?!.*_$) and add (?<!_)\b after [a-z][a-z0-9_]{3,31} ^[3]

answered Oct 1, 2024 at 9:18

Okinea Dev

1552 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Wiktor Stribiżew Over a year ago

(?<!_)\b just means the username should not end with a _. And here it fails. As I said, you must replace the .*s with more restricted versions.

Okinea Dev Over a year ago

@WiktorStribiżew, Give me an example in regex101.com

Wiktor Stribiżew Over a year ago

Already given. And just above, too.

Collectives™ on Stack Overflow

The regular expression for parsing Telegram usernames stops parsing valid usernames if there is an invalid username in the same line [duplicate]

1 Answer 1

To do this, remove `(?!.*_$)` and add `(?<!_)\b` after `[a-z][a-z0-9_]{3,31}`

Original answer from Telegram chat:

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

To do this, remove (?!.*_$) and add (?<!_)\b after [a-z][a-z0-9_]{3,31}

Original answer from Telegram chat:

3 Comments

Linked

Related

To do this, remove `(?!.*_$)` and add `(?<!_)\b` after `[a-z][a-z0-9_]{3,31}`