regex not finding all occurance of substring instead returning entire string

Question

import re

txt = "(NASDaq/abnjnxd:number1) ojsnxsjsxmjosx (nasDaq:number2) (NYSE:bhdnd) (Nasdaq:eres)"

x = re.findall("NASDAQ|NYSE.*:.*\)$", txt.upper())

print(x)

Right now I am getting output

['NASDAQ', 'NASDAQ', 'NYSE:BHDND) (NASDAQ:ERES)']

while what I want in output is

['NASDaq/abnjnxd:number1','nasDaq:number2','NYSE:bhdnd','Nasdaq:eres']

Thanks in advance

Use (?:NASDAQ|NYSE).*?:.*?(?=\)). Demo: regex101.com/r/9aNcdi/1 — 41686d6564
– 41686d6564, Commented Aug 18, 2021 at 7:40
Use re.findall(r"(?i)\b(?:NASDAQ|NYSE)\b[^:()]*:[^)]*", txt) — anubhava
– anubhava, Commented Aug 18, 2021 at 7:44
Seems to me as if you want to match everything enclosed by parentheseses. Try (?<=\()[^)]*(?=\)). See it here at Regex101. — SamWhan
– SamWhan, Commented Aug 18, 2021 at 8:27

Benoît Zu · Accepted Answer · 2021-08-18 07:42:57Z

1

import re

txt = "(NASDaq/abnjnxd:number1) ojsnxsjsxmjosx (nasDaq:number2) (NYSE:bhdnd) (Nasdaq:eres)"

x = re.findall("(?:NASDAQ|NYSE)[^\)]+", txt.upper())

print(x)

Details

(?:NASDAQ|NYSE) non-capturing group matching NASDAQ or NYSE
[^\)]+ match all characters until )

Demo

https://regex101.com/r/Y2h7OO/1

answered Aug 18, 2021 at 7:42

Benoît Zu

1,30812 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

The fourth bird · Accepted Answer · 2021-08-18 09:23:23Z

1

You could use a capture group that will be returned by re.findall.

Note that to get the lowercase output, you should not use txt.upper()

You can make the pattern case insensitive using re.I

\(((?:NASDAQ|NYSE)[^()\s]*)\)

The pattern matches:

\( Match (
( Capture group 1
- (?:NASDAQ|NYSE) Match either NASDAQ or NYSE
- [^()\s]* Match 0+ occurrences of any char except ( and ) or a whitspace char
) Close group 1
\) Match )

Regex demo | Python demo

For example

import re

txt = "(NASDaq/abnjnxd:number1) ojsnxsjsxmjosx (nasDaq:number2) (NYSE:bhdnd) (Nasdaq:eres)"
x = re.findall("\(((?:NASDAQ|NYSE)[^()\s]*)\)", txt, re.I)
print(x)

Output

['NASDaq/abnjnxd:number1', 'nasDaq:number2', 'NYSE:bhdnd', 'Nasdaq:eres']

A more precise pattern could be matching an optional part that starts with a / and word characters, followed by matching : and word characters.

\(((?:NASDAQ|NYSE)(?:/\w+)?:\w+)\)

The pattern matches:

\( Match (
( Capture group 1
- (?:NASDAQ|NYSE) Match either NASDAQ or NYSE
- (?:/\w+)? Optionally match / and 1+ word chars
- :\w+ Match : and 1+ word chars
) Close group 1
\) Match )

Regex demo

edited Aug 18, 2021 at 9:23

answered Aug 18, 2021 at 8:43

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

5 Comments

drake1994 Over a year ago

Works good except its also picking up matches like "NYSE ajjanxxdmdicjmdicjmdcdckdmcldcmdlmdlcmdlcdlcdmcdlc (something else)". What i need is for it to seach only for above mentioned patterns. In above mentioned patterns NASDAQ,NYSE are intechangle in all combinations.

The fourth bird Over a year ago

@NipunTulsyan Do you mean you don't want to match spaces as well? See regex101.com/r/9zFWDz/1

The fourth bird Over a year ago

@NipunTulsyan A more exact pattern could be \(((?:NASDAQ|NYSE)(?:/\w+)?:\w+)\) See regex101.com/r/7yuVlI/1

drake1994 Over a year ago

Could u make it so it also searches for "(" as the first character ? Everything else is fine

The fourth bird Over a year ago

@NipunTulsyan It already does using \( but it is not in the capture group so it will not be part of the result returned by re.findall.

Collectives™ on Stack Overflow

regex not finding all occurance of substring instead returning entire string

2 Answers 2

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related