numpy on multicore hardware

Question

What's the state of the art with regards to getting numpy to use mutliple cores (on Intel hardware) for things like inner and outer vector products, vector-matrix multiplications etc?

I am happy to rebuild numpy if necessary, but at this point I am looking at ways to speed things up without changing my code.

For reference, my show_config() is as follows, and I've never observed numpy to use more than one core:

atlas_threads_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    language = f77
    include_dirs = ['/usr/local/atlas-3.9.16/include']

blas_opt_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
    language = c
    include_dirs = ['/usr/local/atlas-3.9.16/include']

atlas_blas_threads_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    language = c
    include_dirs = ['/usr/local/atlas-3.9.16/include']

lapack_opt_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
    language = f77
    include_dirs = ['/usr/local/atlas-3.9.16/include']

lapack_mkl_info:
  NOT AVAILABLE

blas_mkl_info:
  NOT AVAILABLE

mkl_info:
  NOT AVAILABLE

I doubt you can achive any speedup by the multithreaded computation fo dot products of vectors of size 4000. Such a dot product needs only a few microseconds to compute. The overhead of assigning the task to a separate thread will probably at least nullify any speed you might gain, even when using thread pools. — Sven Marnach
– Sven Marnach, Commented May 13, 2011 at 20:55
I'm multiplying 32M x (4k ... 1.5M) matrices with (4k ... 1.5M) x something matrices, and tried to do so using the multiprocessing-toolbox, nevertheless this seems to create a lot of memory overhead, as data is copied to new processes (thank the GIL for that). Would be great if all 8 cores were used by atlas. — Herbert
– Herbert, Commented Jun 30, 2015 at 14:07

talonmies · Accepted Answer · 2011-05-13 18:28:41Z

7

You should probably start by checking whether the Atlas build that numpy is using has been built with multi-threading. You can build and run this to inspect the Atlas configuration (straight from the Atlas FAQ):

main()
/*
 * Compile, link and run with something like:
 *    gcc -o xprint_buildinfo -L[ATLAS lib dir] -latlas ; ./xprint_buildinfo
 * if link fails, you are using ATLAS version older than 3.3.6.
 */
{
   void ATL_buildinfo(void);
   ATL_buildinfo();
   exit(0);
}

If you have don't have a multithreaded version of Atlas: "there's your problem". If it is multithreaded, then you need to exercise one of the multithreaded BLAS3 routines (probably dgemm), with a suitably large matrix-matrix product and see whether threading is used. I think I am right in saying that neither BLAS 2 and BLAS 1 routines in Atlas support multithreading (and with good reason because there is no performance advantage except at truly enormous problem sizes).

answered May 13, 2011 at 18:28

talonmies

72.7k35 gold badges204 silver badges296 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nino Over a year ago

What exactly is the compiler command? what should -L[ATLAS lib dir] be?

Collectives™ on Stack Overflow

numpy on multicore hardware

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related