One problem you would face if using scipy.cluster.vq.kmeans is that that function uses Euclidean distance to measure closeness. To shoe-horn your problem into one solveable by k-means clustering, you'd have to find a way to convert your strings into numerical vectors and be able to justify using Euclidean distance as a reasonable measure of closeness.
That seems... difficult. Perhaps you are looking for Levenshtein distance instead?
Note there are variants of the K-means algorithm that can work with non-Euclideance distance metrics (such as Levenshtein distance). K-medoids (aka PAM), for instance, can be applied to data with an arbitrary distance metric.
For example, using Pycluster's implementation of k-medoids, and nltk's implementation of Levenshtein distance,
import nltk.metrics.distance as distance
import Pycluster as PC
words = ['apple', 'Doppler', 'applaud', 'append', 'barker',
'baker', 'bismark', 'park', 'stake', 'steak', 'teak', 'sleek']
dist = [distance.edit_distance(words[i], words[j])
for i in range(1, len(words))
for j in range(0, i)]
labels, error, nfound = PC.kmedoids(dist, nclusters=3)
cluster = dict()
for word, label in zip(words, labels):
cluster.setdefault(label, []).append(word)
for label, grp in cluster.items():
print(grp)
yields a result like
['apple', 'Doppler', 'applaud', 'append']
['stake', 'steak', 'teak', 'sleek']
['barker', 'baker', 'bismark', 'park']