I am developing a python package that needs to, among other things, process a file containing a list of dataset names and I need to extract the components of these names.
Examples of dataset names would be:
- diskLineLuminosity:halpha:rest:z1.0
- diskLineLuminosity:halpha:rest:z1.0:dust
- diskLineLuminosity:halpha:rest:z1.0:contam_NII
- diskLineLuminosity:halpha:rest:z1.0:contam_NII:contam_OII:contam_OIII
- diskLineLuminosity:halpha:rest:z1.0:contam_NII:contam_OIII:dust
- diskLineLuminosity:halpha:rest:z1.0:contam_OII:contam_NII
- diskLineLuminosity:halpha:rest:z1.0:contam_NII:recent
I'm looking for a way to parse the dataset names using regex to extract all the dataset information, including a list of all instances of "contam_*" (where zero instances are allowed). I realise that I could just split the string and used fnmatch.filter, or equivalent, but I also need to be able to flag erroneous dataset names that do not match the above syntax. Also, regex is currently used extensively in similar situations throughout the package and so I prefer not to introduce a second parsing method.
As an MWE, with an example dataset name, I have pieced together:
import re
datasetName = "diskLineLuminosity:halpha:rest:z1.0:contam_NII:recent"
M = re.search("^(disk|spheroid)LineLuminosity:([^:]+):([^:]+):z([\d\.]+)(:recent)?(:contam_[^:]+)?(:dust[^:]+)?",datasetName)
This returns:
print M.group(1,2,3,4,5,6,7)
('disk', 'halpha', 'rest', '1.0', None, ':contam_NII', None)
In the package, this regex search needs to go into a function similar to:
def getDatasetNameInformation(datasetName):
INFO = re.search("^(disk|spheroid)LineLuminosity:([^:]+):([^:]+):z([\d\.]+)(:recent)?(:contam_[^:]+)?(:dust[^:]+)?",datasetName)
if not INFO:
raise ParseError("Cannot parse '"+datasetName+"'!")
return INFO
I am still new to using regex so how can I modify the re.search string to successfully parse all of the above dataset names and extract the information in the substrings (including a list of all the instances of contamination)?
Thanks for any help you can provide!