3

I have an array of two dimensional arrays named matrices. Each matrix in there is of dimension 1000 x 1000 and consists of positive values. Now I want to take the log of all values in all the matrices (except for 0).
How do I do this easily in python?
I have the following code that does what I want, but knowing Python this can be made more brief:

newMatrices = []
for matrix in matrices:
    newMaxtrix = []
    for row in matrix:
        newRow = []
        for value in row:
            if value > 0:
                newRow.append(np.log(value))
            else:
                newRow.append(value)
        newMaxtrix.append(newRow)
    newMatrices.append(newMaxtrix)
1
  • 1
    "positive values" implies there are no zeroes. Do you mean "non-negative values"? Commented Feb 15, 2019 at 10:58

5 Answers 5

6

You can convert it into numpy array and usenumpy.log to calculate the value.

For 0 value, the results will be -Inf. After that you can convert it back to list and replace the -Inf with 0

Or you can use where in numpy

Example:

res = where(arr!= 0, log2(arr), 0)

It will ignore all zero elements.

Sign up to request clarification or add additional context in comments.

Comments

3

While @Amadan 's answer is certainly correct (and much shorter/elegant), it may not be the most efficient in your case (depends a bit on the input, of course), because np.where() will generate an integer index for each matching value. A more efficient approach would be to generate a boolean mask. This has two advantages: (1) it is typically more memory efficient (2) the [] operator is typically faster on masks than on integer lists.

To illustrate this, I reimplemented both the np.where()-based and the mask-based solution on a toy input (but with the correct sizes). I have also included a np.log.at()-based solution which is also quite inefficient.

import numpy as np


def log_matrices_where(matrices):
    return [np.where(matrix > 0, np.log(matrix), 0) for matrix in matrices]


def log_matrices_mask(matrices):
    arr = np.array(matrices, dtype=float)
    mask = arr > 0
    arr[mask] = np.log(arr[mask])
    arr[~mask] = 0  # if the values are always positive this is not needed
    return [x for x in arr]


def log_matrices_at(matrices):
    arr = np.array(matrices, dtype=float)
    np.log.at(arr, arr > 0)
    arr[~(arr > 0)] = 0  # if the values are always positive this is not needed
    return [x for x in arr]


N = 1000
matrices = [
    np.arange((N * N)).reshape((N, N)) - N
    for _ in range(2)]

(some sanity check to make sure we are doing the same thing)

# check that the result is the same
print(all(np.all(np.isclose(x, y)) for x, y in zip(log_matrices_where(matrices), log_matrices_mask(matrices))))
# True
print(all(np.all(np.isclose(x, y)) for x, y in zip(log_matrices_where(matrices), log_matrices_at(matrices))))
# True

And the timings on my machine:

%timeit log_matrices_where(matrices)
# 33.8 ms ± 1.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit log_matrices_mask(matrices)
# 11.9 ms ± 97 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit log_matrices_at(matrices)
# 153 ms ± 831 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

EDIT: additionally included np.log.at() solution and a note on zeroing out the values for which log is not defined

1 Comment

No need of arr[~mask] = 0 if we are guaranteed no negative numbers and want to leave zeroes alone.
2

Another alternative using numpy:

arr = np.ndarray((1000,1000))
np.log.at(arr, np.nonzero(arr))

1 Comment

Nice use of numpy's at, again :-) +1
1

As simple as...

import numpy as np
newMatrices = [np.where(matrix != 0, np.log(matrix), 0) for matrix in matrices]

No need to worry about rows and columns, numpy takes care of it. No need to explicitly iterate over matrices in a for loop when a comprehension is readable enough.

EDIT: I just noticed OP had log, not log2. Not really important for the shape of the solution (though likely very important to not getting a wrong answer :P )

2 Comments

This may be quite inefficient, because np.where() internally works with integers. A boolean mask approach is typically much faster here, as detailed in my answer.
Fair enough. I was going more for programmer confort than speed (and for many uses, a difference between 11ms and 33ms is will not be terribly important).
0

as suugested by @R.yan you can try something like this.

import numpy as np

newMatrices = []
for matrix in matrices:
    newMaxtrix = []
    for row in matrix:
        newRow = []
        for value in row:
            if value > 0:
                newRow.append(np.log(value))
            else:
                newRow.append(value)
        newMaxtrix.append(newRow)
    newMatrices.append(newMaxtrix)

newArray = np.asarray(newMatrices)
logVal = np.log(newArray)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.