The shift() function was fairly easy to understand, in no small part because
you've chosen reasonable and clear variable names. Here are a few relatively
minor suggestions regarding the function. (1) Strings are iterable, so you can
convert to characters directly. (2) Python makes iteration so easy that looping
over data via an index is usually not needed. Instead, just iterate directly.
And for those cases when an index is needed, enumerate() is often the best
option, because it provides both index and value -- which helps your code by
reducing its visual weight and thus aiding readability. (3) The assignment to
new_char is a moderately complex expression. One way to improve code
readability is to such computations apart. (4) A dict to lookup a letter's
index would eliminate the need to scan ALPHABET repeatedly.
# 4
ALPHA_IDXS = {a:i for i, a in enumerate(ALPHABET)}
def shift(string, n):
# 1
shifted = list(string)
# 2
for i, char in enumerate(shifted):
if char.isalpha():
# 3 and 4
j = (ALPHA_IDXS[char.lower()] + n) % len(ALPHABET)
new_char = ALPHABET[j]
if char.isupper():
new_char = new_char.upper()
shifted[i] = new_char # The only place we use index.
return ''.join(shifted)
Another technique to consider is converting shift() into a generator function.
This simplifies the code even more because the function no longer has to
assemble a return data collection -- although at a small cost to the caller,
which must do the assembling. Here's what that might look like.
def shift(string, n):
for i, char in enumerate(string):
if char.isalpha():
j = (ALPHA_IDXS[char.lower()] + n) % len(ALPHABET)
new = ALPHABET[j]
char = new if char.islower() else new.upper()
yield char
You did a nice job in getGoodness() with keeping the steps of the algorithm
very clear. Here are a few suggestions: (1) Because bool is a subclass of
int, you can treat True and False as integers -- just add up the Trues.
Also, functions like sum() take any iterable, so you don't need an
intermediate list. (2) msg_length seems a bit misleading: it is not the
length of msg; it is the number of alpha characters. (3) A Counter can
compute letter frequencies for you, so you can avoid the computational cost of
str.count() (relevant if you intend to process large amounts of text). (4)
Shouldn't this calculation be done on a lowercase version of the text?
from collections import Counter
def getGoodness(msg):
# 1 and 2
n_alphas = sum(i.isalpha() for i in msg)
# 3 and 4
freqs = Counter(msg.lower())
msg_frqs = [round(freqs[a] / n_alphas, 4) for a in ALPHABET]
diffs = [abs(a - b) for a, b in zip(msg_frqs, FREQUENCIES)]
avg_diff = round(sum(diffs) / len(diffs), 4)
return avg_diff
Finally, some comments about the orchestration code. (1) Put your code in
functions. (2) Set yourself up for testing. It helps a lot during the
development and debugging lifecycle. (3) In the loop to compute goodnesses for
the shifted strings, a tuple seems like a handier data structure than a
dict. It simplifies the code and is better suited for min(). (4) I
recommend the habit of setting up all scripts so they can take arguments --
again, useful during development and sometimes usage.
# 1
def main(args):
# 2
TESTS = (
(
"Oqqcfrwbu hc ozz ybckb zokg ct ojwohwcb, hvsfs wg bc kom o pss gvcizr ps opzs hc tzm. Whg kwbug ofs hcc gaozz hc ush whg toh zwhhzs pcrm ctt hvs ufcibr. Hvs pss, ct qcifgs, tzwsg obmkom psqoigs pssg rcb'h qofs kvoh viaobg hvwby wg wadcggwpzs. Mszzck, pzoqy. Mszzck, pzoqy. Mszzck, pzoqy. Mszzck, pzoqy. Ccv, pzoqy obr mszzck! Zsh'g gvoys wh id o zwhhzs. Poffm! Pfsoytogh wg fsorm!",
"According to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because bees don't care what humans think is impossible. Yellow, black. Yellow, black. Yellow, black. Yellow, black. Ooh, black and yellow! Let's shake it up a little. Barry! Breakfast is ready!",
),
(
"Hello, world!",
"Ebiil, tloia!",
),
)
for msg, exp in TESTS:
# 3
scores = []
for i in range(26):
txt = ''.join(shift(msg, i))
g = (getGoodness(txt), txt)
scores.append(g)
got = min(scores)[1]
# 2 Always be testing.
print(exp == got, '=>', got[0:30])
# 4
if __name__ == '__main__':
main(sys.argv[1:])