I'm developing a web app (using gevent, but that is not significant) that has to write some confidential information in log. The obvious idea is to encrypt the confidential information using a public key that is hard-coded into my application. To read it, one would need a private key, and 2048-bit RSA seems to be safe enough. I have chosen pycrypto (tried M2Crypto as well, but found nearly no differences for my purpose) and implemented log encryption as a logging.Formatter subclass. However, I'm new to pycrypto and cryptoraphy, and I am not sure my choice of the way my data is encrypted is reasonable. Is PKCS1_OAEP module what I need? Or there are more friendly ways of encryption without dividing the data in small chunks?
So, what I did is:
import logging
import sys
from Crypto.Cipher import PKCS1_OAEP as pkcs1
from Crypto.PublicKey import RSA
PUBLIC_KEY = """ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDe2mtK03UhymB+SrIbJJUwCPhWNMl8/gA9d7jex0ciSuFfShDaqJ4wYWG4OOl\
VqKMxPrPcZ/PMSwtc021yI8TXfgewb65H/YQw4JzzGANq2+mFT8jWRDn+xUc6vcWnXIG3OPg5DvIipGQvIPNIUUP3qE7yDHnS5xdVdFrVe2bUUXmZJ9\
0xJpyqlTuRtIgfIfEQC9cggrdr1G50tXdXZjS0M1WXl5P6599oH/ykjpDFrCnh5fz9WDwUc0mNJ+11Qh+yfDp3k7AhzhRaROKLVWnfkklFaFm7LsdVX\
KPjp7dPRcTb84c2OnlIjU0ykL74Fy0K3eaPvM6TLe/K1XuD3933 pupkin@pupkin"""
PUBLIC_KEY = RSA.importKey(PUBLIC_KEY)
LOG_FORMAT = '[%(asctime)-15s - %(levelname)s: %(message)s]'
# May be more, but there is a limit.
# I suppose, the algorithm requires enough padding,
# and size of padding depends on key length.
MAX_MSG_LEN = 128
# Size of a block encoded with padding. For a 2048-bit key seems to be OK.
ENCODED_CHUNK_LEN = 256
def encode_msg(msg):
res = []
k = pkcs1.new(PUBLIC_KEY)
for i in xrange(0, len(msg), MAX_MSG_LEN):
v = k.encrypt(msg[i : i+MAX_MSG_LEN])
# There are nicer ways to make a readable line from data than using hex. However, using
# hex representation requires no extra code, so let it be hex.
res.append(v.encode('hex'))
assert len(v) == ENCODED_CHUNK_LEN
return ''.join(res)
def decode_msg(msg, private_key):
msg = msg.decode('hex')
res = []
k = pkcs1.new(private_key)
for i in xrange(0, len(msg), ENCODED_CHUNK_LEN):
res.append(k.decrypt(msg[i : i+ENCODED_CHUNK_LEN]))
return ''.join(res)
class CryptoFormatter(logging.Formatter):
NOT_SECRET = ('CRITICAL',)
def format(self, record):
"""
If needed, I may encode only certain types of messages.
"""
try:
msg = logging.Formatter.format(self, record)
if not record.levelname in self.NOT_SECRET:
msg = encode_msg(logging.Formatter.format(self, record))
return msg
except:
import traceback
return traceback.format_exc()
def decrypt_file(key_fname, data_fname):
"""
The function decrypts logs and never runs on server. In fact,
server does not have a private key at all. The only key owner
is server admin.
"""
res = ''
with open(key_fname, 'r') as kf:
pkey = RSA.importKey(kf.read())
with open(data_fname, 'r') as f:
for l in f:
l = l.strip()
if l:
try:
res += decode_msg(l, pkey) + '\n'
except Exception: # A line may be unencrypted
res += l + '\n'
return res
# Unfortunately dictConfig() does not support altering formatter class.
# Anyway, in demo code I am not going to use dictConfig().
logger = logging.getLogger()
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(CryptoFormatter(LOG_FORMAT))
logger.handlers = []
logger.addHandler(handler)
logging.warning("This is secret")
logging.critical("This is not secret")
UPDATE: Thanks to the accepted answer below, now I see:
My solution seems to be pretty valid for now (very few log entries, no performance considerations, more or less trusted storage). Concerning security, the best thing I can do right now is not forgetting to prohibit the user who runs my daemon from writing to the
.pyand.pycfiles of the program. :-) However, if the user is compromised, he still may try to attach a debugger to my daemon process, so I should also disable login for him. Pretty obvious moments, but very important ones.Surely there are solutions being much more scalable. A very common technique is to encrypt AES keys with slow but reliable RSA, and to encrypt data with the AES that is pretty fast. Data encryption in the case is symmetric, but retrieving the AES key requires either breaking RSA, or getting it from memory when my program is running. Stream encryption with higher-level libraries and binary log file format also are a way to go, though binary log format encrypted as a stream should be very vulnerable to log corruption, even a sudden reboot due to electricity blackout may be a problem unless I do some things at a lower level (at least log rotation on each daemon start).
I changed
.encode('hex')to.encode('base64').replace('\n').replace('\r'). Fortunately, the base64 codec works fine with no line ends. It saves some space.Using an untrusted storage may require signing records, but that seems to be another story.
Checking if the string is encrypted based on catching exceptions is ok, since, unless the log is tampered with by a malicious user, it's base64 codec who raises an exception, not RSA decryption.