1

I have an Numpy array:

 A = [ 1.56  1.47  1.31  1.16  1.11  1.14  1.06  1.12  1.19  1.06  0.92  0.78
       0.6   0.59  0.4   0.03  0.11  0.54  1.17  1.9   2.6   3.28  3.8   4.28
       4.71  4.61  4.6   4.41  3.88  3.46  3.04  2.63  2.3   1.75  1.24  1.14
       0.97  0.92  0.94  1.    1.15  1.33  1.37  1.48  1.53  1.45  1.32  1.08
       1.06  0.98  0.69]

How can I obtain the shannon entropy?

I have seen it like this but not sure:

print -np.sum(A * np.log2(A), axis=1)
1
  • 2
    The A variable mentioned is not a numpy array Commented Mar 19, 2018 at 9:21

1 Answer 1

14

There are essentially two cases and it is not clear from your sample which one applies here.

(1) Your probability distribution is discrete. Then you have to translate what appear to be relative frequencies to probabilities

pA = A / A.sum()
Shannon2 = -np.sum(pA*np.log2(pA))

(2) Your probability distribution is continuous. In that case the values in your input needn't sum to one. Assuming that the input is sampled regularly from the entire space, you'd get

pA = A / A.sum()
Shannon2 = -np.sum(pA*np.log2(A))

but in this case the formula really depends on the details of sampling and the underlying space.

Side note: the axis=1 in your example will cause an error since your input is flat. Omit it.

Sign up to request clarification or add additional context in comments.

3 Comments

@Paul Panzer. Could you elaborate a bit on what would constitutes discrete and continuous. Specifically, If I had a 2dhistogram from the amount of time a subject spent in each bin. Is that discrete or continuous. Specifically, I have a grid of 4x4 and a subject spent equal time in each bin. If I converted that 2dhistogram into a list you would have 16 values. Would you analyse that as discrete or continuous?
@Punter345 bins are discrete. A continuous distribution is different in that probabilities are "normalized by surface area" think of it as chopping it up into ever smaller bins in order to get the probability at a single point. To get a meaningful limit you have to normalise by binsize which is why continuous behaves differently to discrete. If you want to fully understand that stuff you'll have to read up on it I'm afraid.
Thanks @Paul Panzer. I have read a few academic papers. I understand the concept of entropy, I just don't know what algorithms are valid for different situations. There's a few different tweaks in each equation I read. To be specific, my example splits the surface area in 1m squared values and returns a count for each second spent in those bins.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.