1

I have a string that looks like this a = 'readyM01JUN_01_18_0144.xlsx' and I would like to tease out the JUN.

I thought first trying to split a with respect to digits, but a.split('[0-9]+') doesn't work. Any ideas ?

4
  • Depending on what you really want to get, re.search(r'\d(\D+)\d', s).group(1) or re.search(r'\d([^_\d]+)_\d', s).group(1). If a is a string, split in your case only accepts a literal string to split, not a regex. To split with a regex pattern, you need re.split. Commented Jun 7, 2018 at 8:23
  • @anubhava i edited the title, i hope it is more clear now Commented Jun 7, 2018 at 8:30
  • Can you paste, what is input string samples(a = 'readyM01JUN_01_18_0144.xlsx' or just readyM01JUN_01_18_0144.xlsx) and what output do you want? Commented Jun 7, 2018 at 8:31
  • @nandal I want as output JUN, and the input is a = 'readyM01JUN_01_18_0144.xlsx' as a string Commented Jun 7, 2018 at 8:34

3 Answers 3

2

Since a is a string, split in your case only accepts a literal string to split, not a regex. To split with a regex pattern, you need re.split.

However, you may use

import re
a = 'readyM01JUN_01_18_0144.xlsx'
m = re.search(r'\d([^_\d]+)_\d', a) # Or, r'\d([a-zA-Z]+)_\d'
if m:
    print(m.group(1))

See the Python demo

Pattern details

  • \d - a digit
  • ([^_\d]+) - Group 1 matching and capturing (m.group(1) will hold this value) 1+ chars other than digits and _ (you may even use ([a-zA-Z]+) to match 1+ ASCII letters)
  • _\d - a _ and a digit.

See the regex demo.

Note that re.search returns the first leftmost match.

Sign up to request clarification or add additional context in comments.

Comments

1

Not sure what your program objective is, but if JUN stands for June, and you have a series of months and your data and want to remove them all, I would create a list of months, iterate through them, and then replace them in the particular string you are working on. You can get JUN out of the string by using the .remove() variable on a and then placing it as the value of a new variable a, since strings are immutable. Here is an example:

months = ['JAN', 'FEB', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEPT', 'OCT', 'NOV', 'DEC']
a = 'readyM01JUN_01_18_0144.xlsx'

for month in months:
   if month in a:
      a = a.replace(month, '')
      print(a)

OUTPUT:

readyM01_01_18_0144.xlsx

Comments

0

You could also try an iterative approach like this:

import re

def remove_string(string, sub):
    res = string
    reduce = 0
    for loc in re.finditer(sub, string):
        res = res[:loc.start()+reduce] + res[loc.start()+len(sub)+reduce:]
        reduce -= len(sub)

    return res

Which Outputs:

>>> remove_string('readyM01JUN_01_18_0144.xlsx', 'JUN')
readyM01_01_18_0144.xlsx
>>> remove_string('readyM01JUN_01_18_0144JUN.xlsx', 'JUN')
readyM01_01_18_0144.xlsx

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.