0

I have one problem here, I have a regexp that extracts the username in Telegram links, starting from a simple "@" to username.t.me links

image with problem in the regular expession

The problem is that if I enter @aaaa, @jfewewf, both usernames match correctly, but when I enter @aaaa, @jfewewf_, neither username matches, even though the script should match only username @aaaa (because the username on the right side is not valid)

Here is my regex:

(?:(?<!\S)@|(?:(?:https?://|)(?:t\.me|telegram\.(?:me|dog))/(?:c/|)|tg://resolve\?domain=)|(?=^(?!.*__)(?!.*_{2,})[a-z][a-z0-9_]{3,31}(?<!_)\.t\.me$))(?P<username>(?!.*__)(?!.*_$)(?!.*_{2,})[a-z][a-z0-9_]{3,31})(?P<subdomain>\.t\.me)?

You can test it at this link: https://regex101.com/r/JFF1S0/9

Please help me 🙏🙏🙏

I've already tried almost everything, I don't know how to solve it at all.

10
  • what constitutes a valid username? could you provide the desired output for your sample data? Commented Oct 1, 2024 at 7:44
  • You expect multiple matches, BUT you are using .* in your restrictive negative lookaheads (e.g. (?!.*__)(?!.*_$)(?!.*_{2,})), so no wonder you do not get expected matches. Restrict the . inside these lookaheads to valid username chars. Commented Oct 1, 2024 at 7:49
  • @DuesserBaest, Usernames that begin or end with an underscore, those that begin with a number, those that contain two underscores in a row, those that consist only of numbers, those that contain symbols other than Latin letters, numbers, and underscores are considered invalid. The only problem here is that due to one invalid username in one line, the valid username does not parse, as it was shown in the image Commented Oct 1, 2024 at 7:52
  • @WiktorStribiżew, So what should I change? Commented Oct 1, 2024 at 7:54
  • 1
    If your usernames only contain "word" chars, then you can default to \w. Here is your regex with \w. Here is BB's regex with \w. Commented Oct 1, 2024 at 9:13

1 Answer 1

-1

To do this, remove (?!.*_$) and add (?<!_)\b after [a-z][a-z0-9_]{3,31}

Here is the updated regular expression:

(?:(?<!\S)@|(?:(?:https?://|)(?:t\.me|telegram\.(?:me|dog))/(?:c/|)|tg://resolve\?domain=)|(?=^(?!.*__)(?!.*_{2,})[a-z][a-z0-9_]{3,31}(?<!_)\.t\.me$))(?P<username>(?!.*__)(?!.*_{2,})[a-z][a-z0-9_]{3,31}(?<!_)\b)(?P<subdomain>\.t\.me)?

https://regex101.com/r/JFF1S0/10

Original answer from Telegram chat:

regex101 This checks to the end of the line, so it finds the "_" at the end and does not take the entire line [1]

try https://regex101.com/r/C6FZER/1 [1]

im remove you (?!.*_$) and add (?<!_)\b after [a-z][a-z0-9_]{3,31} [3]

Sign up to request clarification or add additional context in comments.

3 Comments

(?<!_)\b just means the username should not end with a _. And here it fails. As I said, you must replace the .*s with more restricted versions.
@WiktorStribiżew, Give me an example in regex101.com
Already given. And just above, too.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.