0

I have about 650 csv-based matrices. I plan on loading each one using Numpy as in the following example:

m1 = numpy.loadtext(open("matrix1.txt", "rb"), delimiter=",", skiprows=1)

There are matrix2.txt, matrix3.txt, ..., matrix650.txt files that I need to process.

My end goal is to multiply each matrix by each other, meaning I don't necessarily have to maintain 650 matrices but rather just 2 (1 ongoing and 1 that I am currently multiplying my ongoing by.)

Here is an example of what I mean with matrices defined from 1 to n: M1, M2, M3, .., Mn.

M1*M2*M3*...*Mn

The dimensions on all the matrices are the same. The matrices are not square. There are 197 rows and 11 columns. None of the matrices are sparse and every cell comes into play.

What is the best/most efficient way to do this in python?

EDIT: I took what was suggested and got it to work by taking the transpose since it isn't a square matrix. As an addendum to the question, is there a way in Numpy to do element by element multiplication?

16
  • Is there a particular step of the process on which you are stuck? Commented Mar 5, 2016 at 7:37
  • what is the size of the matrices ? the costly task is reading, so load the maximum of them at each time. Commented Mar 5, 2016 at 8:12
  • Often there is a difference between best (elegant) and efficient. You could read them in multiple threads if cpu doesn't matter, or only 2 matrix at same time if memory matters if time matters. How big are they? Does memory matters? Commented Mar 5, 2016 at 8:30
  • 1
    If you have double elements in there, 197*11, 650*650 different products etc, they all, including results, still would fit in your RAM. Commented Mar 5, 2016 at 19:58
  • 1
    You don't happen to actually mean element wise multiplications? With those dimensions you would have to transpose the other before multiplication to get either 197x11 @ 11x197 or 11x197 @ 197x11. Commented Mar 5, 2016 at 20:12

4 Answers 4

2

A Python3 solution, if "each matrix by each other" actually means just multiplying them in a row and the matrices have compatible dimensions ( (n, m) · (m, o) · (o, p) · ... ), which you hint at with "(1 ongoing and 1 that...)", then use (if available):

from functools import partial
fnames = map("matrix{}.txt".format, range(1, 651))
np.linalg.multi_dot(map(partial(np.loadtxt, delimiter=',', skiprows=1), fnames))

or:

from functools import reduce, partial
fnames = map("matrix{}.txt".format, range(1, 651))
matrices = map(partial(np.loadtxt, delimiter=',', skiprows=1), fnames)
res = reduce(np.dot, matrices)

Maps etc. are lazy in python3, so files are read as needed. Loadtxt doesn't require a pre-opened file, a filename will do.

Doing all the combinations lazily, given that the matrices have the same shape (will do a lot of rereading of data):

from functools import partial
from itertools import starmap, combinations
map_loadtxt = partial(map, partial(np.loadtxt, delimiter=',', skiprows=1))
fname_combs = combinations(map("matrix{}.txt".format, range(1, 651)), 2)
res = list(starmap(np.dot, map(map_loadtxt, fname_combs)))

Using a bit of grouping to reduce reloading of files:

from itertools import groupby, combinations, chain
from functools import partial
from operator import itemgetter

loader = partial(np.loadtxt, delimiter=',', skiprows=1)
fname_pairs = combinations(map("matrix{}.txt".format, range(1, 651)), 2)
groups = groupby(fname_pairs, itemgetter(0))
res = list(chain.from_iterable(
    map(loader(k).dot, map(loader, map(itemgetter(1), g)))
    for k, g in groups
))

Since the matrices are not square, but have the same dimensions, you would have to add transposes before multiplication to match the dimensions. For example either loader(k).T.dot or map(np.transpose, map(loader, ...)).

If on the other hand the question actually was meant to address element wise multiplication, replace np.dot with np.multiply.

Sign up to request clarification or add additional context in comments.

3 Comments

He dont want mulitplaying all in a row. Look at the question: "multiply each matrix by each other", means (1,2),(1,3) and so on.
I'm getting the error: 'module' object has no attribute 'multi_dot' after the line np.linalg.multi_dot. I'm trying your first solution. Any idea how to fix this error?
That is why I added the "if available" :). It has been added in version 1.11.0, which is still in development I think.
1

1. Variant: Nice code but reads all matrices at once

matrixFileCount = 3
matrices = [np.loadtxt(open("matrix%s.txt" % i ), delimiter=",", skiprows=1) for i in range(1,matrixFileCount+1)]
allC = itertools.combinations([x for x in range(matrixFileCount)], 2)
allCMultiply = [np.dot(matrices[c[0]], matrices[c[1]]) for c in allC]
print  allCMultiply

2. Variant: Only load 2 Files at once, nice code but a lot of reloading

allCMulitply = []
fileList = ["matrix%s.txt" % x for x in range(1,matrixFileCount+1)]
allC = itertools.combinations(fileList, 2)
for c in allC:
    m = [np.loadtxt(open(file), delimiter=",", skiprows=1) for file in c]
    allCMulitply.append(np.dot(m[0], m[1]))
print allCMulitply

3. Variant: like the second but avoid loading every time. But only 2 matrix at one point in memory

Cause the permutations created with itertools are like (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) you can avoid somtimes loading both of the 2 matrices.

matrixFileCount = 3
allCMulitply = []
mLoaded = {'file' : None, 'matrix' : None}
fileList = ["matrix%s.txt" % x for x in range(1,matrixFileCount+1)]
allC = itertools.combinations(fileList, 2)
for c in allC:
    if c[0] is mLoaded['file']:
        m = [mLoaded['matrix'], np.loadtxt(open(c[1]), delimiter=",", skiprows=1)]
    else:
        mLoaded = {'file' : None, 'matrix' : None}
        m = [np.loadtxt(open(file), delimiter=",", skiprows=1) for file in c]
    mLoaded = {'file' : c[0], 'matrix' : m[0]}
    allCMulitply.append(np.dot(m[0], m[1]))
print allCMulitply

Performance

If you can load all Matrix at once in the memory, the first part is faster then the second, cause in the second you reload matrices a lot. Third part slower than first, but faster than second, cause it avoids sometimes to reloading matrices.

0.943613052368 (Part 1: 10 Matrices a 2,2 with 1000 executions)
7.75622487068  (Part 2: 10 Matrices a 2,2 with 1000 executions)
4.83783197403  (Part 3: 10 Matrices a 2,2 with 1000 executions)

1 Comment

@B.M. Thank you very much for the notice changed my Code. Hopefully remove your downvoting now ;-) . Thanks
0

Kordi's answer loads all of the matrices before doing the multiplication. And that's fine if you know the matrices are going to be small. If you want to conserve memory, however, I'd do the following:

import numpy as np

def get_dot_product(fnames):
    assert len(fnames) > 0
    accum_val = np.loadtxt(fnames[0], delimiter=',', skiprows=1)
    return reduce(_product_from_file, fnames[1:], initializer=accum_val)

def _product_from_file(running_product, fname):
    return running_product.dot(np.loadtxt(fname, delimiter=',', skiprows=1))

If the matrices are large and irregular in shape (not square), there are also optimization algorithms for determining the optimal associative groupings (i.e., where to put the parentheses), but in most cases I doubt it would be worth the overhead of loading and unloading each file twice, once to figure out the associative groupings and then once to carry it out. NumPy is surprisingly fast even on pretty big matrices.

Comments

0

How about a really simple solution avoiding map, reduce and the like? The default numpy array object does element-wise multiplication by default.

size = (197, 11)

result = numpy.ones(size)
for i in range(1, 651):
    result *= numpy.loadtext(open("matrix{}.txt".format(i), "rb"),
                             delimiter=",", skiprows=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.