I wrote a simple Python 3 program that takes data from either a file or standard input and xor encrypts or decrypts the data. By default, the output is encoded in base64, however there is a flag for disabling that --raw. It works as intended except when I am using the raw mode, in which case an extra line and some random data is appended to the output when decrypting xor data.
#!/usr/bin/env python3
from itertools import cycle
import argparse
import base64
import re
def xor(data, key):
return ''.join(chr(ord(str(a)) ^ ord(str(b))) for (a, b) in zip(data, cycle(key)))
# check if a string is base64 encoded.
def is_base64(s):
pattern = re.compile("^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$")
if not s or len(s) < 1:
return False
else:
return pattern.match(s)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--infile', type=argparse.FileType('r'), default='-', dest='data', help='Data to encrypt'
'or decrypt')
parser.add_argument('-r', '--raw', dest='raw', default=False, action='store_true', help='Do not use base64 encoding')
parser.add_argument('-k', '--key', dest='key', help='Key to encrypt with', required=True)
args = parser.parse_args()
data = args.data.read()
key = args.key
raw = args.raw
if raw:
ret = xor(data, key)
print(str(ret))
else:
if is_base64(data):
# print('is base64')
decoded = base64.b64decode(data).decode()
ret = xor(decoded, key)
print(ret)
else:
# print('is not base64')
ret = xor(data, key)
encoded = base64.b64encode(bytes(ret, "utf-8"))
print(encoded.decode())
When running without the --raw flag, everything performs as intended:
$ echo lol|./xor.py -k 123
XV1fOw==
echo lol|./xor.py -k 123 |./xor.py -k 123
lol
However, if I disable base64, something rather odd happens. It's easier to demonstrate then it is to explain:
$ echo lol|./xor.py -k 123 -r |./xor.py -k 123 -r
lol
8
Does anyone know why I am seeing the character 8 appended to the output of xor decrypted data? I have a c program called xorpipe that I use for this exact use case, and it does not suffer this bug. I wanted to rewrite it in Python.
I am looking for other constructive criticism, suggestions, or reviews as well. Particular, I would like argparse to be able to determine whether the supplied input is either a file, string, or data piped from standard input. This is easy to accomplish bash or C, but I am not sure how best to do this in Python.