1

I calculated vector using numpy and fft. I used numpy broadcast method and for loop. speed of Two methods is similar. How can I calculate vector using multicore and numpy and fft?

import numpy as np
from numpy.fft import fft, ifft

num_row, num_col = 6000, 13572

ss = np.ones((num_row, num_col), dtype=np.complex128)
sig = np.random.standard_normal(num_col) * 1j * np.random.standard_normal(num_col)

# for loop    
for idx in range(num_row):
    ss[idx, :] = ifft(fft(ss[idx, :]) * sig)

# broadcast
ss = ifft(fft(ss, axis=1) * sig, axis=1)

Result

loop : 10.798867464065552 sec
broadcast : 11.298897981643677 sec
8
  • You can specify an axis to fft rather than using a loop np.fft.fft(a, axis=1) Commented Apr 18, 2018 at 15:39
  • You can also use broadcasting for your array multiplication ss*sig Commented Apr 18, 2018 at 15:46
  • What is this even doing? You're computing the FFT of a vector of ones (which is just [N, 0, 0, 0...], where N is the length of the vector), multiplying by some random data, and then IFFT??? I think there's an XY problem here? Commented Apr 18, 2018 at 16:05
  • 1
    Anyway, default Numpy FFT isn't multi-threaded: you either have to use something like Enthought's Python distribution (where Numpy is built against Intel MKL, which has a heavily optimized FFT) or use PyFFTW (which leans on FFTW's multithreading). Or do you want to try and work around the GIL and use Python's built-in threading capabilities? Commented Apr 18, 2018 at 16:06
  • Also you probably meant to do stdn(ncol) + 1j * stdn(ncol) (note the +). Commented Apr 18, 2018 at 16:07

2 Answers 2

1

You can use the axis parameter for fft and ifft, as well as broadcasting:

ss = np.ones((num_row, num_col), dtype=np.complex128)
sig = np.random.standard_normal(num_col) * 1j * np.random.standard_normal(num_col)

ss = ifft(fft(ss, axis=1) * sig, axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

ifft(fft(ss, axis=1) * sig, axis=1) is slower than for loop.
1

I compared broadcast, threadpool and loop. In this case, ThreadPool had the best performance.

# %% Import
# Standard library imports
import time
from multiprocessing.pool import ThreadPool

# Third party imports
from numpy import zeros, complex128, allclose
from numpy.fft import fft, ifft
from numpy.random import standard_normal


# %% Generate data
n_row, n_col = 6000, 13572

ss = standard_normal((n_row, n_col)) + 1j * standard_normal((n_row, n_col))
sig = standard_normal(n_col) + 1j * standard_normal(n_col)
ss_loop = zeros((n_row, n_col), dtype=complex128)
ss_thread = zeros((n_row, n_col), dtype=complex128)

# %% Loop processing
start_time = time.time()
for idx in range(n_row):
    ss_loop[idx, :] = ifft(fft(ss[idx, :]) * sig)
print(f'loop elapsed time : {time.time() - start_time}')

# %% Broadcast processing
start_time = time.time()
ss_broad = ifft(fft(ss, axis=1) * sig, axis=1)
print(f'broadcast elapsed time : {time.time() - start_time}')


# %% ThreadPool processing
def filtering(idx_thread):
    ss_thread[idx_thread, :] = ifft(fft(ss[idx_thread, :]) * sig)


start_time = time.time()
pool = ThreadPool()
pool.map(filtering, range(n_row))
print(f'ThreadPool elapsed time : {time.time() - start_time}')


# %% Verify result
if allclose(ss_thread, ss_broad, rtol=1.e-8):
    print('ThreadPool Correct')

if allclose(ss_loop, ss_broad, rtol=1.e-8):
    print('Loop Correct')

Result

loop elapsed time : 5.102990627288818
broadcast elapsed time : 4.520442008972168
ThreadPool elapsed time : 1.6988463401794434

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.