I have a data file that is essentially a fixed-width text file. There are a variable number of spaces and positioning in the text. I'm trying to parse the file into a list with python and can't figure out the appropriate regex (of course am open to non-regex options too).
Date Run By Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Level 8 Level 9
11-15-2014 12:27:43 AM 1 ** 259.0
11-15-2014 7:47:09 AM 1 ** 98.0
11-15-2014 3:45:07 PM 1 ** 785.0
11-16-2014 12:27:43 AM 1 ** 245.0
11-16-2014 7:51:36 AM 1 ** 96.0
11-16-2014 3:43:12 PM 1 ** 788.0
11-17-2014 12:27:43 AM 1 ** 248.0
11-17-2014 7:51:21 AM 1 ** 104.0
11-17-2014 12:45:57 PM 1 ** 97.0 257.0 793.0
11-17-2014 3:46:33 PM 1 ** 792.0
11-18-2014 12:32:31 AM 1 ** 253.0
11-18-2014 7:50:31 AM 1 ** 104.0
11-18-2014 3:48:43 PM 1 ** 781.0
11-19-2014 12:30:36 AM 1 ** 260.0
11-19-2014 8:40:26 AM 1 ** 102.0
11-19-2014 3:47:45 PM 1 ** 803.0
11-20-2014 12:28:40 AM 1 ** 243.0
11-20-2014 7:53:38 AM 1 ** 107.0
11-20-2014 3:43:55 PM 1 ** 787.0
11-21-2014 1:03:45 AM 0 PS 245.0
11-21-2014 7:52:55 AM 1 ** 101.0
11-21-2014 3:44:09 PM 1 ** 789.0
11-22-2014 12:37:26 AM 1 ** 250.0
11-22-2014 7:49:55 AM 1 ** 103.0
Thus far I've tried:
for line in f:
line = re.split(r' (?=[A-Z])| (?=[0-9])| ',line)
However, I don't get even alignment of the columns. I need them to line up for use downstream.
The desired output is (sorry for the limited number of rows, parsing it manually is deadly!).
['Date', '', 'Run', 'By', 'Level 1', 'Level 2', 'Level 3', 'Level 4', 'Level 5', 'Level 6', 'Level 7', 'Level 8', 'Level 9','\r\n']
['\r\n']
['\r\n']
['11-15-2014', '12:27:43', 'AM 1', '**', '', '259.0', '', '', '', '', '', '', '', '\r\n']
['11-15-2014', '7:47:09', 'AM 1', '**', '98.0', '', '', '', '', '', '', '', '', '\r\n']
['11-15-2014', '3:45:07', 'PM 1', '**', '', '', '785.0', '', '', '', '', '', '', '\r\n']
...
...
['11-17-2014', '12:45:57', 'PM 1', '**', '97.0', '257.0', '793.0', '', '', '', '', '', '', '\r\n']
In essence 13 items followed by a line break; combining date and time into a single field would be fine, mostly I need the dates and three levels to line up properly; there are only values for Level 1, Level 2, and Level 3. Values are usually a single level/row, but occasionally there are all three (as shown).