I have a list of files, and I am trying to filter for a subset of file names that end in 000000, 060000, 120000, 180000. I know I could do a straight string match, but I would like to understand why the regular expression I attempted below r'[00|06|12|18]+0000', would not work (it is returning MSM_20130519210000.csv as well). I intend it to be match either one of 00, 06, 12, 18, follow by 0000. How can that be accomplished? Please keep the answer along the line of this intended regex instead of other functions, thanks.
Here is the code snippet:
import re
files_in_input_directory = ['MSM_20130519150000.csv', 'MSM_20130519180000.csv', 'MSM_20130519210000.csv',
'MSM_20130520000000.csv', 'MSM_20130520030000.csv', 'MSM_20130520060000.csv', 'MSM_20130520090000.csv',
'MSM_20130520120000.csv', 'MSM_20130520150000.csv', 'MSM_20130520180000.csv', 'MSM_20130520210000.csv',
'MSM_20130521000000.csv', 'MSM_20130521030000.csv', 'MSM_20130521060000.csv', 'MSM_20130521090000.csv',
'MSM_20130521120000.csv', 'MSM_20130521150000.csv', 'MSM_20130521180000.csv', 'MSM_20130521210000.csv',
'MSM_20130522000000.csv', 'MSM_20130522030000.csv', 'MSM_20130522060000.csv', 'MSM_20130522090000.csv',
'MSM_20130522120000.csv', 'MSM_20130522150000.csv', 'MSM_20130522180000.csv', 'MSM_20130522210000.csv',
'MSM_20130523000000.csv', 'MSM_20130523030000.csv', 'MSM_20130523060000.csv', 'MSM_20130523090000.csv',
'MSM_20130523120000.csv', 'MSM_20130523150000.csv', 'MSM_20130523180000.csv', 'MSM_20130523210000.csv',
'MSM_20130524000000.csv', 'MSM_20130524030000.csv', 'MSM_20130524060000.csv', 'MSM_20130524090000.csv',
'MSM_20130524120000.csv', 'MSM_20130524150000.csv', 'MSM_20130524180000.csv', 'MSM_20130524210000.csv',
'MSM_20130525000000.csv', 'MSM_20130525030000.csv', 'MSM_20130525060000.csv', 'MSM_20130525090000.csv',
'MSM_20130525120000.csv', 'MSM_20130525150000.csv', 'MSM_20130525180000.csv', 'MSM_20130525210000.csv',
'MSM_20130526000000.csv', 'MSM_20130526030000.csv', 'MSM_20130526060000.csv', 'MSM_20130526090000.csv',
'MSM_20130526120000.csv', 'MSM_20130526150000.csv', 'MSM_20130526180000.csv', 'MSM_20130526210000.csv',
'MSM_20130527000000.csv', 'MSM_20130527030000.csv', 'MSM_20130527060000.csv', 'MSM_20130527090000.csv',
'MSM_20130527120000.csv', 'MSM_20130527150000.csv', 'MSM_20130527180000.csv', 'MSM_20130527210000.csv',
'MSM_20130528000000.csv', 'MSM_20130528030000.csv', 'MSM_20130528060000.csv', 'MSM_20130528090000.csv',
'MSM_20130528120000.csv', 'MSM_20130528150000.csv', 'MSM_20130528180000.csv', 'MSM_20130528210000.csv',
'MSM_20130529000000.csv', 'MSM_20130529030000.csv', 'MSM_20130529060000.csv', 'MSM_20130529090000.csv']
print files_in_input_directory
print "\n"
# trying to match any string with 000000, 060000, 120000, 180000
# Question: I use + meaning one or more, and | to indicates the options, but this will match
# 'MSM_20130519210000.csv' as well, and I don't know why
print filter(lambda x:re.search(r'[00|06|12|18]+0000', x), files_in_input_directory)
print "\n"
# This verbose version works
print filter(lambda x:re.search(r'0000000|060000|120000|180000', x), files_in_input_directory)
print "\n"