How to get substring between two numbers (of unknown digit length) in python

Question

I have a string that looks like this a = 'readyM01JUN_01_18_0144.xlsx' and I would like to tease out the JUN.

I thought first trying to split a with respect to digits, but a.split('[0-9]+') doesn't work. Any ideas ?

Depending on what you really want to get, re.search(r'\d(\D+)\d', s).group(1) or re.search(r'\d([^_\d]+)_\d', s).group(1). If a is a string, split in your case only accepts a literal string to split, not a regex. To split with a regex pattern, you need re.split. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jun 7, 2018 at 8:23
Can you paste, what is input string samples(a = 'readyM01JUN_01_18_0144.xlsx' or just readyM01JUN_01_18_0144.xlsx) and what output do you want? — nandal
– nandal, Commented Jun 7, 2018 at 8:31
@nandal I want as output JUN, and the input is a = 'readyM01JUN_01_18_0144.xlsx' as a string — quant
– quant, Commented Jun 7, 2018 at 8:34

Wiktor Stribiżew · Accepted Answer · 2018-06-07 08:29:06Z

2

Since a is a string, split in your case only accepts a literal string to split, not a regex. To split with a regex pattern, you need re.split.

However, you may use

import re
a = 'readyM01JUN_01_18_0144.xlsx'
m = re.search(r'\d([^_\d]+)_\d', a) # Or, r'\d([a-zA-Z]+)_\d'
if m:
    print(m.group(1))

See the Python demo

Pattern details

\d - a digit
([^_\d]+) - Group 1 matching and capturing (m.group(1) will hold this value) 1+ chars other than digits and _ (you may even use ([a-zA-Z]+) to match 1+ ASCII letters)
_\d - a _ and a digit.

See the regex demo.

Note that re.search returns the first leftmost match.

answered Jun 7, 2018 at 8:29

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Simeon Ikudabo · Accepted Answer · 2018-06-07 08:29:21Z

1

Not sure what your program objective is, but if JUN stands for June, and you have a series of months and your data and want to remove them all, I would create a list of months, iterate through them, and then replace them in the particular string you are working on. You can get JUN out of the string by using the .remove() variable on a and then placing it as the value of a new variable a, since strings are immutable. Here is an example:

months = ['JAN', 'FEB', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEPT', 'OCT', 'NOV', 'DEC']
a = 'readyM01JUN_01_18_0144.xlsx'

for month in months:
   if month in a:
      a = a.replace(month, '')
      print(a)

OUTPUT:

readyM01_01_18_0144.xlsx

answered Jun 7, 2018 at 8:29

Simeon Ikudabo

2,1901 gold badge14 silver badges28 bronze badges

Comments

RoadRunner · Accepted Answer · 2018-06-07 09:35:51Z

0

You could also try an iterative approach like this:

import re

def remove_string(string, sub):
    res = string
    reduce = 0
    for loc in re.finditer(sub, string):
        res = res[:loc.start()+reduce] + res[loc.start()+len(sub)+reduce:]
        reduce -= len(sub)

    return res

Which Outputs:

>>> remove_string('readyM01JUN_01_18_0144.xlsx', 'JUN')
readyM01_01_18_0144.xlsx
>>> remove_string('readyM01JUN_01_18_0144JUN.xlsx', 'JUN')
readyM01_01_18_0144.xlsx

edited Jun 7, 2018 at 9:35

answered Jun 7, 2018 at 9:08

RoadRunner

26.4k6 gold badges46 silver badges77 bronze badges

Collectives™ on Stack Overflow

How to get substring between two numbers (of unknown digit length) in python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related