EDIT: I'm changing this answer a bit. I'll leave the original answer below.
In my other answer I commented that the best thing would be to find a built-in Python module that would do the unpacking. I couldn't think of one, but perhaps I should have Google searched for one. @John Machin provided an answer that showed how to do it: use the Python struct module. Since that is written in C, it should be faster than my pure Python solution. (I haven't actually measured anything so it is a guess.)
I do agree that the logic in the original code is "un-Pythonic". Returning a sentinel value isn't best; it's better to either return a valid value or raise an exception. The other way to do it is to return a list of valid values, plus another list of invalid values. Since @John Machin offered code to yield up valid values, I thought I'd write a version here that returns two lists.
NOTE: Perhaps the best possible answer would be to take @John Machin's answer and modify it to save the invalid values to a file for possible later review. His answer yields up answers one at a time, so there is no need to build a large list of parsed records; and saving the bad lines to disk means there is no need to build a possibly-large list of bad lines.
import struct
def parse_records(self):
"""
returns a tuple: (good, bad)
good is a list of valid records (as tuples)
bad is a list of tuples: (line_num, line, err)
"""
cols = self.Columns()
unpack_fmt = ""
sign_checks = []
start = 0
for colx, info in enumerate(cols, 1):
clen = info.columnLength
if clen < 1:
raise ValueError("Column %d: Bad columnLength %r" % (colx, clen))
if info.skipColumn:
unpack_fmt += str(clen) + "x"
else:
unpack_fmt += str(clen) + "s"
if info.hasSignage:
sign_checks.append(start)
start += clen
expected_len = start
unpack = struct.Struct(unpack_fmt).unpack
good = []
bad = []
for line_num, line in enumerate(self.whatever_the_list_of_lines_is, 1):
if len(line) != expected_len:
bad.append((line_num, line, "bad length"))
continue
if not all(line[i] in '+-' for i in sign_checks):
bad.append((line_num, line, "sign check failed"))
continue
good.append(unpack(line))
return good, bad
ORIGINAL ANSWER TEXT:
This answer should be a lot faster if the self.Columns() information is identical over all the records. We do the processing of the self.Columns() information one time, and build a couple of lists that contain just what we need to process a record.
This code shows how to compute parsedList but doesn't actually yield it up or return it or do anything with it. Obviously you would need to change that.
def parse_records(self):
cols = self.Columns()
slices = []
sign_checks = []
start = 0
for info in cols:
if info.columnLength < 1:
raise ValueError, "bad columnLength"
end = start + info.columnLength
if not info.skipColumn:
tup = (start, end)
slices.append(tup)
if info.hasSignage:
sign_checks.append(start)
expected_len = end # or use (end - 1) to not count a newline
try:
for line in self.whatever_the_list_of_lines_is:
if len(line) != expected_len:
raise ValueError, "wrong length"
if not all(line[i] in '+-' for i in sign_checks):
raise ValueError, "wrong input"
parsedLine = [line[s:e] for s, e in slices]
except ValueError:
parsedLine = False
Exceptionsubclass instead. The only reason they appear to work anymore is that they're wrong and therefore raise an error.except:is almost always wrong can can lead to frustrating, untrackable bugs. If all you wanted to catch wasraise "Wrong Input"then what you would do is defineclass InvalidInputError(Exception):passand thenexcept InvalidInputError:.passis a docstring.