0

I have to extract file ID from file links. The file link is similar to this example "\abc.xyz\folder1\folder2\folder3\folder5 \F-17-50021\OE \abc\xyz\file.xlsm" (bold part remains the same in every link). I decided to use regex to extract the file ID as there is a fixed pattern. I tried using the code below.

p = "Antartica"

re.search("n(.*)c",p).group(1)

It gives the output 'tarti' which is fine. I created a same regex to extract the file ID but it's not working.

p = r"\\abc.xyz\folder1\folder2\folder3\folder5\F-17-50021\OE\abc\xyz\file.xlsm"

re.search('\(.*)\OE', p).group(1)

I'm getting an error in mentioning "'NoneType' object has no attribute 'group'".

Please explain what is wrong in my code. How can I make it work?

TIA

3
  • 2
    There are no " in your input string. Commented Dec 2, 2020 at 21:25
  • I did that to accommodate the backslash inside the string. I will edit the post. Commented Dec 2, 2020 at 21:28
  • You are getting a None from re.search(...) this is the case when that doesn't find a pattern match. like @WiktorStribiżew stated you are looking for " characters in your pattern and they do not exist. Also consider using os.path.split() for a more robust cross-platform solution. Commented Dec 2, 2020 at 21:29

1 Answer 1

1

Use

import re
 
p = r"\\abc.xyz\folder1\folder2\folder3\folder5\F-17-50021\OE\abc\xyz\file.xlsm"
match = re.search(r'\\([^\\]+)\\OE', p)
if match is not None:
    print(match.group(1))

See Python proof.

Results: F-17-50021

Explanation

--------------------------------------------------------------------------------
  \\                       '\'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^\\]+                   any character except: '\\' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \\                       '\'
--------------------------------------------------------------------------------
  OE                       'OE'
Sign up to request clarification or add additional context in comments.

7 Comments

this works! Could you please explain the regex a little?
@wickedpanda See Explanation
Thanks a lot Ryszard! :)
I just found that in some links "\OE" has been replaced by "\FLD" or "\TURN" . Is it possible to add all three in the regex like an "or" statement '\OE" or "FLD" or "\TURN"?
@wickedpanda Yes, r'\\([^\\]+)\\(?:OE|FLD|TURN)'
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.