0

This regex (Regular expression) find all words or group of word who begin with capital letter. But is should exclude words after a dot followed by a space and a word who begin by a capital letters: I.E. it will exclude Hello because a dot and space are preceding the word Hello ". Hello you".

The goal is to replace in a text all included word from the regex by a href link but will exclude ". Any word beginning with Cap letter". It look like:

// EXCLUDE: (. Hello) dot and space precede the capital word )
const regex = /\b((?!\.[\s]+)(?:[A-Z][\p{L}0-9-_]+)(?:\s+[A-Z][\p{L}0-9-_]+)*)\b/ug; 
const subst = '<a href="#">$1</a>';

I though that (?!\.[\s]+) should do the trick but it's not.

Here a test on regex101: https://regex101.com/r/nwyL8I/3

Thank you.

1
  • You probably want \b(?<!\.\s+)((?:\p{Lu}[\p{L}0-9_-]+)(?:\s+\p{Lu}[\p{L}0-9_-]+)*)\b, see this demo Commented Feb 12, 2023 at 19:57

2 Answers 2

1

The correct way to express a negative lookbehind assertion for your situation would be (?<!\.\s+) and not (?!\.\s+), which is a negative lookahead assertion. So I would use:

((?<!\.\s+)\b(?:[A-Z][\p{L}0-9-_]+)(?:\s+[A-Z][\p{L}0-9-_]+)*)\b

But (?:[A-Z][\p{L}0-9-_]+) will not match words with a single letter, such as A. Is that what you really want?

Sign up to request clarification or add additional context in comments.

2 Comments

It's even better like that to not match only one letter. Thank you, seem to work perfectly. i have only a last small fix. when a string begin like: The World Trade Centers Association (WTCA) was founded in 1970.. The first The is included and i do not when it, i think we cannot do that for the first sentence without the dot?
@Gino Use as the negative lookbehind assertion (?!\.[\s]+|^). This will assert that preceding the word is neither '. ' or the start of the string. See demo.
0

The current regular expression seems to match words or groups of words that start with a capital letter and excludes words that are preceded by a dot and a space. However, the exclusion of words after a dot followed by a space might not be working as expected.

One issue with the current regular expression is that it is only checking for the first character of the word after the dot to be a space. You can modify the exclusion part to also check if the first character after the dot is a capital letter:

const regex = /\b((?!\.[A-Z][\p{L}0-9-_])(?:[A-Z][\p{L}0-9-_]+)(?:\s+[A-Z][\p{L}0-9-_]+)*)\b/ug;

This modification should ensure that words that are preceded by a dot and a capital letter will be excluded from the matching process.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.