2,094 questions
2
votes
4
answers
330
views
regex for matching a pattern with an optional part
I need a regex pattern to find substrings of the form "a:<some integer>" and an optional "b:<some float>" in a large string. The "a" string may be preceded ...
3
votes
3
answers
132
views
Python regular expression for text search
I am trying to extract wanted text from a given set of text. I have created below function.
def extract_name(title):
matches = re.findall(r'\b[A-Z0-9\s&.,()-]+(?:\s*\(\d\))?\b', title)
...
0
votes
1
answer
103
views
Predicting `re` regexp memory consumption
I have a large (gigabyte) file where an S-expression appears, and I want to skip to the end of the S-expression. The depth of the S-expression is limited to 2, so I tried using a Python regexp (b'\\((?...
4
votes
2
answers
189
views
Using re.sub and replace with overall match [duplicate]
I was just writing a program where I wanted to insert a newline after a specific pattern. The idea was to match the pattern and replace with the overall match (i.e. capture group \0) and \n.
s = "...
0
votes
1
answer
45
views
Python re.sub: backreference in replacement pattern followed by digit [duplicate]
I would like to match a regular expression in a string and add the character 0 after all occurrences. That is, each match will be replaced with itself followed by 0. But because 0 is a digit, I don'...
0
votes
1
answer
71
views
Match characters between square brackets but only if text inside brackets follows pattern [duplicate]
I want to match text inside of square brackets - but ONLY if it contains hashtag+digit+digit
i.e [#18] or [hello #25 bye]
NOT [25] (no hashtag)
I ultimately want to remove these match strings (...
1
vote
2
answers
55
views
Counting the hashtags in a collection of tweets: two methods with inconsistent results
I'm playing around with a numpy dataframe containing two columns: 'tweet_text' and 'cyberbullying_type'. It was created through this dataset as follows:
df = pd.read_csv('data/cyberbullying_tweets.csv'...
1
vote
1
answer
184
views
Static typing of Python regular expression: 'incompatible type "str"; expected "AnyStr | Pattern[AnyStr]" '
Just to be clear, this question has nothing to do with the regular expression itself and my code is perfectly running even though it is not passing mypy strict verification.
Let's start from the basic,...
0
votes
1
answer
44
views
re.findall with requests doesn't match copied and pasted html (generated by requests.text)
I'm trying to capture some elements from the html code of a certain url.
When I copy and paste the contents of the html directly to into my python code it works well.
import re
# Sample HTML content
...
-1
votes
1
answer
49
views
How to create a regex with an optional group without merging it with another group? [duplicate]
I'm trying to write a regex pattern in Python to capture two groups, where the second group is optional, but I want the groups to remain distinct.
Here is are examples of the possible pattern I want ...
1
vote
2
answers
78
views
re.sub eats next charater when replacing with more [duplicate]
so i was trying to format my text for markdown v2, basically I just want to replace a special character a with \a
when trying to do this with regex, it does so< but the new symbol eats up the next ...
-3
votes
2
answers
82
views
Extracting string from ann email body
I'm using python to extract the information provided from the body of an email using imap.
Part of the email that interests to my code:
"BOT ID: 4824CF8B-2986-11EC-80F0-84A93851B964"
I can ...
0
votes
1
answer
63
views
Regex get parse inet value
I want to parse ifconfig to get ip_address, net mask and broadcast. and these are optional fields. If it present, it should return but if not it should return None.
My below pattern works fine but if '...
1
vote
1
answer
57
views
Issue with Toggling Sign of the Last Entered Number in Calculator Using ⁺∕₋ in Python
I am developing a calculator using Python. The problem I'm facing is that when I try to toggle the sign of the last number entered by the user using the ⁺∕₋ button, all similar numbers in the text get ...
-1
votes
2
answers
77
views
RegEx: Python (findall). Order of elements in OR statement resulting in different output
I am trying to get my head around regular expressions and was playing with some examples trying to see what it comes out at. I am struggling to understand how the order of element in OR (|) impacts ...
8
votes
2
answers
228
views
How to ignore case but not diacritics with Python regex?
I'm working with a set of regex patterns that I have to match in a target text.
My problematic regex is something like this: (İg)[[:punct:][:space:]]+[[:alnum:]]+
Initially, I noticed that Python’s re ...
1
vote
1
answer
135
views
Facing irregular format while extracting data from pdf invoice to transfer in excel file
I have a irregular format pdf invoice files with multiple pages. I want excel file in return with data extracted from pdf files. For this I write code with plumberpdf library in python but I am able ...
0
votes
0
answers
33
views
How to make a non-greedy regex when in multiline mode [duplicate]
I have a text file (latin-1-encoded) with this content:
1
lorem ipsum 1 ...
1 OCTOBER 24, 2024 11/27/13
lorem ipsum 2 ...
1 ...
0
votes
0
answers
57
views
Regex pattern sanitization for wildcard replacement
I need a function to sanitize regex patterns in Python, specifically targeting strings that may contain wildcard characters (%). The goal is to replace these % wildcards with the regex equivalent .* ...
2
votes
1
answer
106
views
Using re to match a digit + any contiguous duplicates and storing the duplicates, not just the digit as the result [duplicate]
I'm trying to use re.findall(pattern, string) to match all numbers and however many duplicates follow in a string. Eg. "1222344" matches "1", "222", "3", "...
-1
votes
2
answers
186
views
Is there a limit to the size of a string in Python's re.search? [duplicate]
I am extracting data from an API call and am using this code:
if response.status_code == 200:
ReportResponse = re.search('<return>(.+?)</return>', response.text)
print(...
0
votes
1
answer
80
views
Regular expression for searching only natural numbers
It is necessary to write a regular expression to search for natural numbers in the text.
Numbers can be inside words and any special characters. The main condition for the search is a sequence of ...
0
votes
1
answer
49
views
python re identifiers not working with lookahead and lookbehind
I have the following string
str = '2024-09-23 18:05:08,147 INFO [WatchDog_191084] (alloc:0MB, cpu:0%) 10 422'
and I am trying to extract the numbers between the squared brackets. so I am ...
2
votes
1
answer
49
views
Grabbing a specific url from a webpage with re and requests [duplicate]
import requests, re
r = requests.get('example.com')
p = re.compile('\d')
print(p.match(str(r.text)))
This always prints None, even though r.text definitely contains numbers, but print(p.match('...
1
vote
1
answer
77
views
Python re.sub () replace content but replacement contains special characters
I'm working on auto replacing contents in a file, the re.search() are successfully got the new_content, but it contains special characters and when I want to use re.sub() it shows :
error: invalid ...
0
votes
2
answers
85
views
Unexpected behaviour of the regex "{m, n}?$"
Consider the following example
>>> import sys, re
>>> sys.version
'3.11.5 (main, Sep 11 2023, 13:23:44) [GCC 11.2.0]'
>>> re.__version__
'2.2.1'
>>> re.findall('a{1,...
1
vote
1
answer
92
views
Reformat complex file output from an old fortran program to csv using python
I want to convert complex file output into a simpler version, but I can't seem to get the regex right.
I have tried using regex and pandas to convert this weird formatted code to something nicer but ...
-1
votes
1
answer
74
views
Text is split depending on the order of specific delimiter [duplicate]
The code is supposed to split the string without removing the delimiters.
import re
operations = '8-8/84'
operations = re.split(r'([+,*,/,-])', operations)
Executing the code, operations ends up with ...
-3
votes
1
answer
75
views
Sumarize double for loop into list comprehension
I've been trying to translate these two for loops into list comprehension:
with open(sourceFile, 'r+t') as file:
for line in file:
for key, value in patterns.items():
...
0
votes
1
answer
87
views
Why do certain regex functions return a match object and a few don't? [closed]
In Python common regex functions, re.match, re.search, re.fullmatch, etc. return a match object and to print the result we have to use match.group():
re.search(pattern, string): Searches for the first ...
0
votes
0
answers
39
views
How can I use python regex to find as many matches as possible, leaving out those that are concatenations? [duplicate]
I have this string
"~/goofy.git$ /home/maria/L1-07-51.mdl /home/maria/L1-08-09.res"
I want to find every occurrence of a string that starts /home and ends in either res or mdl. And:
I want ...
-1
votes
2
answers
86
views
How to use regex to extract a set of particular substrings?
I want to extract all possible substrings which have all the vowels from a string. For example in the code:
import re
text = "thisisabeautifulsequencofwords"
pattern = r"(?=.*a)(?=.*e)(?...
1
vote
1
answer
95
views
How do I set the time zone format to abbreviated on Windows 10?
I've written a python script that uses strftime() from the time module. On my windows 10 computer I get the long form format for time zone when I call strftime("%Z"), and I want to ...
-1
votes
1
answer
40
views
how to find higher case followed by lower case or just higher case
I am trying to match either a higher-case letter followed by a lower-case letter or just a higher-case letter. Many questions were answered about how to get higher-case or lower-case letters, but I ...
0
votes
0
answers
54
views
Find text inside top-level brackets when they're nested [duplicate]
I have a file with nested brackets. I need to parse the text within the top-level brackets with Python regex.
import re
string = '{a {b} c} {d}'
# desired output: ['a {b} c', 'd']
# non-greedy
...
1
vote
1
answer
55
views
xhUsing Regex to find instances of a headr, then editing the lines below with some specifications
So I have this excerpt of the .msg file below. What I wish to do is for all the [sel xxx xxx] headers find them then read the lines below them. If any of the answers contain a (+3) or any (+x) then ...
-2
votes
3
answers
73
views
"not" workaround when using Regular Expression in Python [duplicate]
What I want to do is validate user inputs. The criterion is only numeric inputs are allowed, no alpha, no characters like .,/?<> etc.
Say a user inputs 1989, it will print true
But if the user ...
1
vote
1
answer
72
views
Replace characters before a number to a new character after the number python
I have some strings look like: *.rem.1.gz and *.rem.2.gz
And I want to replace it into *.1.trim.gz and *.2.trim.gz
The number 1 and number two files are paired with each other, which I want to create ...
5
votes
0
answers
124
views
Why does re._compile exist?
Here is re.compile:
>>> import re, inspect
>>> print(inspect.getsource(re.compile))
def compile(pattern, flags=0):
"Compile a regular expression pattern, returning a Pattern ...
0
votes
1
answer
44
views
Is there any situation where re.search could not be used instead of re.match? [duplicate]
The documentation seems clear but it begs the question, what is the purpose of re.match? Couldn't re.search with the caret (^) be used instead as long as the MULTILINE flag is not enabled? Is re.match ...
1
vote
2
answers
166
views
How to find all occurrences of a substring in a string while ignoring some characters in Python?
I'd like to find all occurrences of a substring while ignoring some characters. How can I do it in Python?
Example:
long_string = 'this is a t`es"t. Does the test work?'
small_string = "test&...
1
vote
1
answer
91
views
How to extract the volume from a string using a regular expression?
I need to extract the volume with regular expression from strings like
"Candy BAR 350G" (volume = 350G),
"Gin Barrister 0.9ml" (volume = 0.9ml),
"BAXTER DRY Gin 40% 0.5 ml&...
2
votes
1
answer
72
views
How can I simplify this method to replace punctuation while keeping special words intact?
I am making a modulatory function that will take keywords with special characters (@&\*%) and keep them intact while all other punctuation is deleted from a sentence. I have devised a solution, ...
2
votes
1
answer
71
views
How do I fix this Reg ex so that it matches hyphenated words where the final segment ends in a consonant other than the letter m
I want to match all cases where a hyphenated string (which could be made up of one or multiple hyphenated segments) ends in a consonant that is not the letter m.
In other words, it needs to match ...
-3
votes
1
answer
24
views
Replacing part of string with re.sub with number and string [closed]
I want to replace part of a string based on re.sub method as below
import re
re.sub("([0-9]_F)$", '[0-9]_DO', 'sdsd3_F')
However I fail to manage the numerical part of match which is also a ...
0
votes
1
answer
69
views
How can a number range and value be extracted from this complicated string using Python?
I have a complicated string that includes a kilometer range and a fee for users that fall into that range. Ideally, I would like to transform the string into something that I could use to easily ...
-2
votes
1
answer
40
views
regular expression to find pattern in the same word [duplicate]
There is a string "123:987 767687:99 145:986 156:876 "
My regex expression is (\d{3}):\1
I expecting the result is 123:987, 145:986, 156:876
there is no result found. i dont undertsand. ...
0
votes
1
answer
54
views
Match a patern with multiple entries in arbitrary order in Python with re [duplicate]
I try to catch values entered in syntax like this one name="Game Title" authors="John Doe" studios="Studio A,Studio B" licence=ABC123 url=https://example.com command=&...
4
votes
1
answer
155
views
Why is `re.Pattern` generic?
import re
x = re.compile(r"hello")
In the above code, x is determined to have type re.Pattern[str]. But why is re.Pattern generic, and then specialized to string? What does a re.Pattern[...
0
votes
1
answer
107
views
How does regex filteration work in Python re while logging sensitive info?
I am trying to write a python script which would redact/hide certain data present in a string before logging it out to the console. Below is my code snippet so far.
import re
from logging import DEBUG,...