For an m x n matrix, what's the optimal (fastest) way to compute the mutual information for all pairs of columns (n x n)?
By mutual information, I mean:
I(X, Y) = H(X) + H(Y) - H(X,Y)
where H(X) refers to the Shannon entropy of X.
Currently I'm using np.histogram2d and np.histogram to calculate the joint (X,Y) and individual (X or Y) counts. For a given matrix A (e.g. a 250000 X 1000 matrix of floats), I am doing a nested for loop,
n = A.shape[1]
for ix = arange(n)
for jx = arange(ix+1,n):
matMI[ix,jx]= calc_MI(A[:,ix],A[:,jx])
Surely there must be better/faster ways to do this?
As an aside, I've also looked for mapping functions on columns (column-wise or row-wise operations) on arrays, but haven't found a good general answer yet.
Here is my full implementation, following the conventions in the Wiki page:
import numpy as np
def calc_MI(X,Y,bins):
c_XY = np.histogram2d(X,Y,bins)[0]
c_X = np.histogram(X,bins)[0]
c_Y = np.histogram(Y,bins)[0]
H_X = shan_entropy(c_X)
H_Y = shan_entropy(c_Y)
H_XY = shan_entropy(c_XY)
MI = H_X + H_Y - H_XY
return MI
def shan_entropy(c):
c_normalized = c / float(np.sum(c))
c_normalized = c_normalized[np.nonzero(c_normalized)]
H = -sum(c_normalized* np.log2(c_normalized))
return H
A = np.array([[ 2.0, 140.0, 128.23, -150.5, -5.4 ],
[ 2.4, 153.11, 130.34, -130.1, -9.5 ],
[ 1.2, 156.9, 120.11, -110.45,-1.12 ]])
bins = 5 # ?
n = A.shape[1]
matMI = np.zeros((n, n))
for ix in np.arange(n):
for jx in np.arange(ix+1,n):
matMI[ix,jx] = calc_MI(A[:,ix], A[:,jx], bins)
Although my working version with nested for loops does it at reasonable speed, I'd like to know if there is a more optimal way to apply calc_MI on all the columns of A (to calculate their pairwise mutual information)?
I'd also like to know:
Whether there are efficient ways to map functions to operate on columns (or rows) of
np.arrays(maybe likenp.vectorize, which looks more like a decorator)?Whether there are other optimal implementations for this specific calculation (mutual information)?
histogram2dbeing used, are we talking about discretized entropy here, not differential entropy? If so, what binning strategy is being used and how is the binning method identifiable from the code?