Based on your symptoms, you're running on Python 2. Calling encode on a Python 2 str is almost always nonsensical.
You have two problems; one you're hitting now, and one you'll hit if you fix your current code.
Your first problem is line is already a str in (apparently) UTF-8 encoded bytes, not unicode, so encodeing it implicitly decodes with Python's default encoding (ASCII; this isn't locale specific to my knowledge, and it's a rare Python 2 install that uses anything else), then re-encodes with the specified codec (or the default if not specified). Basically, line was already UTF-8 encoded, you told it to encode again as UTF-8, but that's nonsensical, so Python tried to decode as ASCII first, and failed before it even tried to encode as you instructed.
The solution to this problem is to just not encode line at all; it's already UTF-8 encoded, so you're already golden.
Your second problem (which you haven't encountered yet, but you will) is that you're calling encode on the group(4) result. But of course, since the input was a str, the group is a str too, and you'll encounter the same problem trying to encode a str; since the group came from raw UTF-8 encoded bytes, the non-ASCII parts of it cause a UnicodeDecodeError during the implicit decode step before the encode.
The reason:
import sys
reload(sys)
sys.setdefaultencoding('UTF8')
works is that it (dangerously) changes the implicit decode step to use UTF-8, so all your encode calls now perform the implicit decode with UTF-8 instead of ASCII; the decode and encode is mostly pointless, since all it does is return the original str after confirming it's legal UTF-8 by means of decodeing it as such, and otherwise acting as an expensive no-op.
To fix the second problem, just change:
m.group(4).encode()
to:
m.group(4)
That leaves your final code as:
result = re.sub(r'(Start:\s*)([^:]+)(:\s*)([^:]+)',
lambda m: m.group(1) + m.group(2) + m.group(3) + hashlib.sha512(m.group(4)).hexdigest(),
line)
Optionally, if you want to confirm your expectation that line is in fact UTF-8 encoded bytes already, add the following above that re.sub line:
try:
line.decode('utf-8')
except Exception as e:
sys.exit("line (of type {!r}) not decodable as UTF-8: {}".format(line.__class__.__name__, e))
which will cause the program to exit immediately if the data given is not legal UTF-8 (and will also let you know what type line is, so you can confirm for sure if it's really str or unicode, since str implies you chose the wrong codec, while unicode means your inputs aren't of the expected type).
encodeastr(which is a largely nonsensical thing to do). A full traceback and a minimal reproducible example would be helpful. Lastly, to be sure, split up the line so you onlyencodeonce per line, e.g.encodedline = line.encode('utf-8'), then replaceline.encode('utf-8')in there.subwithencodedlineso you aren't able to confuse whichencodeis the problem.setdefaultencodingis deleted fromsysafter calling it for a reason; changing the default mid-run risks all sorts of problems from various libraries that may have cached the encoding, or the results of encoding things in it, and suddenly find that things aren't behaving the way they did at startup). I strongly suspect your code will work by deleting all calls toencodein that line; you already had UTF-8 encoded data, so trying toencodeit again was the source of your problems. See my answer.