Finding a better way to count matrices

Question

I would like to count the number of 2d arrays with only 1 and 0 entries that have a disjoint pair of disjoint pairs of rows that have equal vector sums. For a 4 by 4 matrix the following code achieves this by just iterating over all of them and testing each one in turn.

import numpy as np
from itertools import combinations
n = 4
nxn = np.arange(n*n).reshape(n, -1)
count = 0
for i in xrange(2**(n*n)):
   A = (i >> nxn) %2
   p = 1
   for firstpair in combinations(range(n), 2):
       for secondpair in combinations(range(n), 2):
           if firstpair < secondpair and not set(firstpair) & set(secondpair):
              if (np.array_equal(A[firstpair[0]] + A[firstpair[1]], A[secondpair[0]] + A[secondpair[1]] )):
                  if (p):
                      count +=1
                      p = 0
print count

The output is 3136.

The problem with this is that it uses 2^(4^2) iterations and I would like to run it for n up to 8. Is there a cleverer way to count these without iterating over all the matrices? For example it seems pointless to create permutations of the same matrix over and over again.

You want "two disjoint pairs of disjoint pairs of rows that have equal vector sums", or it's a typo and you meant "two disjoint pairs of rows that have equal vector sums" ? — Gastón Bengolea
– Gastón Bengolea, Commented Jan 15, 2014 at 22:47
@GastónBengolea Thanks. It is a pair of pairs not two pairs of pairs. — marshall
– marshall, Commented Jan 15, 2014 at 22:51
The matrix is any 2d matrix or one like the one you created right there? — Gastón Bengolea
– Gastón Bengolea, Commented Jan 15, 2014 at 23:01
@GastónBengolea I am considering all n by n matrices with entries that are 1 or 0. My code just iterates over those for n = 4. I would like it to work for n=5,6,7,8 but it takes far too long currently. — marshall
– marshall, Commented Jan 16, 2014 at 7:41
executing a single cpu cycle for each 8x8 matrix would take roughly 10000 years on todays hardware; and as we discovered earlier, you need millions of cycles to check one matrix. So you really need to come up with something better than trying them all. Even modulo row/column permutations, my gut instinct tells me you are still in big trouble. You might get a good estimate of the total number of such matrices by stochastically sampling the space of all matrices? — Eelco Hoogendoorn
– Eelco Hoogendoorn, Commented Jan 16, 2014 at 11:46

David Eisenstat · Accepted Answer · 2014-01-18 22:09:54Z

8

+150

Computed in about a minute on my machine, with CPython 3.3:

4 3136
5 3053312
6 7247819776
7 53875134036992
8 1372451668676509696

Code, based on memoized inclusion-exclusion:

#!/usr/bin/env python3
import collections
import itertools

def pairs_of_pairs(n):
    for (i, j, k, m) in itertools.combinations(range(n), 4):
        (yield ((i, j), (k, m)))
        (yield ((i, k), (j, m)))
        (yield ((i, m), (j, k)))

def columns(n):
    return itertools.product(range(2), repeat=n)

def satisfied(pair_of_pairs, column):
    ((i, j), (k, m)) = pair_of_pairs
    return ((column[i] + column[j]) == (column[k] + column[m]))

def pop_count(valid_columns):
    return bin(valid_columns).count('1')

def main(n):
    pairs_of_pairs_n = list(pairs_of_pairs(n))
    columns_n = list(columns(n))
    universe = ((1 << len(columns_n)) - 1)
    counter = collections.defaultdict(int)
    counter[universe] = (- 1)
    for pair_of_pairs in pairs_of_pairs_n:
        mask = 0
        for (i, column) in enumerate(columns_n):
            mask |= (int(satisfied(pair_of_pairs, column)) << i)
        for (valid_columns, count) in list(counter.items()):
            counter[(valid_columns & mask)] -= count
    counter[universe] += 1
    return sum(((count * (pop_count(valid_columns) ** n)) for (valid_columns, count) in counter.items()))
if (__name__ == '__main__'):
    for n in range(4, 9):
        print(n, main(n))

edited Jan 18, 2014 at 22:09

answered Jan 18, 2014 at 22:02

David Eisenstat

65.7k7 gold badges66 silver badges127 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

David Eisenstat Over a year ago

In case you become interested in values for n > 8, there's still a fair amount of symmetry that could be exploited by collapsing items of valid_columns whose keys are isomorphic under row permutations.

Eelco Hoogendoorn Over a year ago

clever; ive thought about a solution along these lines, but got stuck

DSM Over a year ago

Very nice. By modifying your code to handle the N rows x M columns case to get data to play with I think I've got (empirical) generating functions for N=4,5,6,7. If I can crack the pattern I can at least conjecture values for higher N,M.

Tim Peters Over a year ago

Bravo! That's slick :-)

marshall Over a year ago

Thank you! Although this is above and beyond what SO asks for, could you possibly explain in mathematical terms what your code is computing? Can it be written as a simple sum for example? Also, do you know what its time complexity is?

|

Tim Peters · Accepted Answer · 2014-01-18 18:54:03Z

3

You can file this one under "better than nothing" ;-) Here's plain Python3 code that rethinks the problem a bit. Perhaps numpy tricks could speed it substantially, but hard to see how.

"A row" here is an integer in range(2**n). So the array is just a tuple of integers.
Because of that, it's dead easy to generate all arrays that are unique under row permutation via combinations_with_replacement(). That reduces the trip count on the outer loop from 2**(n**2) to (2**n+n-1)-choose-n). An enormous reduction, but still ...
A precomputed dict maps pairs of rows (which means pairs of integers here!) to their vector sum as a tuple. So no array operations are required when testing, except to test the tuples for equality. With some more trickery, the tuples could be coded as (say) base-3 integers, reducing the inner-loop test to comparing two integers retrieved from a pair of dict lookups.
The time and space required for that precomputed dict is relatively trivial, so no attempt was made to speed that part.
The inner loop picks row indices 4 at a time, instead of your pair of loops each picking two indices at a time. It's faster to do all 4 in one gulp, in large part because there's no need to weed out pairs with a duplicated index.

Here's the code:

def calc_row_pairs(n):
    fmt = "0%db" % n
    rowpair2sum = dict()
    for i in range(2**n):
        row1 = list(map(int, format(i, fmt)))
        for j in range(2**n):
            row2 = map(int, format(j, fmt))
            total = tuple(a+b for a, b in zip(row1, row2))
            rowpair2sum[i, j] = total
    return rowpair2sum

def multinomial(n, ks):
    from math import factorial as f
    assert n == sum(ks)
    result = f(n)
    for k in ks:
        result //= f(k)
    return result

def count(n):
    from itertools import combinations_with_replacement as cwr
    from itertools import combinations
    from collections import Counter
    rowpair2sum = calc_row_pairs(n)
    total = 0
    class NextPlease(Exception):
        pass
    for a in cwr(range(2**n), n):
        try:
            for ix in combinations(range(n), 4):
                for ix1, ix2, ix3, ix4 in (
                       ix,
                       (ix[0], ix[2], ix[1], ix[3]),
                       (ix[0], ix[3], ix[1], ix[2])):
                    if rowpair2sum[a[ix1], a[ix2]] == \
                       rowpair2sum[a[ix3], a[ix4]]:
                        total += multinomial(n, Counter(a).values())
                        raise NextPlease
        except NextPlease:
            pass
    return total

That sufficed to find results through n=6, although it took a long time to finish the last one (how long? don't know - didn't time it - on the order of an hour, though - "long time" is relative ;-) ):

>>> count(4)
3136
>>> count(5)
3053312
>>> count(6)
7247819776

EDIT - removing some needless indexing

A nice speedup by changing the main function to this:

def count(n):
    from itertools import combinations_with_replacement as cwr
    from itertools import combinations
    from collections import Counter
    rowpair2sum = calc_row_pairs(n)
    total = 0
    for a in cwr(range(2**n), n):
        for r0, r1, r2, r3 in combinations(a, 4):
            if rowpair2sum[r0, r1] == rowpair2sum[r2, r3] or \
               rowpair2sum[r0, r2] == rowpair2sum[r1, r3] or \
               rowpair2sum[r0, r3] == rowpair2sum[r1, r2]:
                total += multinomial(n, Counter(a).values())
                break
    return total

EDIT - speeding the sum test

This is minor, but since this seems to be the best exact approach on the table so far, may as well squeeze some more out of it. As noted before, since each sum is in range(3), each tuple of sums can be replaced with an integer (viewing the tuple as giving the digits of a base-3 integer). Replace calc_row_pairs() like so:

def calc_row_pairs(n):
    fmt = "0%db" % n
    rowpair2sum = dict()
    for i in range(2**n):
        row1 = list(map(int, format(i, fmt)))
        for j in range(2**n):
            row2 = map(int, format(j, fmt))
            total = 0
            for a, b in zip(row1, row2):
                t = a+b
                assert 0 <= t <= 2
                total = total * 3 + t
            rowpair2sum[i, j] = total
    return rowpair2sum

I'm sure numpy has a much faster way to do that, but the time taken by calc_row_pairs() is insignificant, so why bother? BTW, the advantage to doing this is that the inner-loop == tests change from needing to compare tuples to just comparing small integers. Plain Python benefits from that, but I bet pypy could benefit even more.

edited Jan 18, 2014 at 18:54

answered Jan 18, 2014 at 7:47

Tim Peters

71.4k14 gold badges133 silver badges140 bronze badges

6 Comments

marshall Over a year ago

Does pypy speed this up out of interest?

Tim Peters Over a year ago

Don't know about pypy - no version of pypy works with Python3 yet. And, while it might seem odd, I stick to pure standard out-of-the-box Python when tackling "hard" problems (no pypy, no numpy, no external packages). Why? Because it's so frickin' slow it forces you to think of cleverer approaches :-)

marshall Over a year ago

Actually.. that makes perfect sense to me :)

marshall Over a year ago

Just FYI your code is actually also in python 2.x it seems and works just fine with pypy and takes 17 minutes for n = 6.

Phil Cooper Over a year ago

@TimPeters wow, nice speedup. has marshall verified the n=5 or n=6 answeres? BTW I might take a look at a numpy or numba solution. Can't imagine it wouldn't be a speedup with "fancy indexing". That's just talk unless I post one so nice job!

|

Eelco Hoogendoorn · Accepted Answer · 2014-01-16 12:56:57Z

2

Not a direct answer to your question, but as I noted, I think you can safely forget about exhaustively testing all matrices for any significant n. But the problem lends itself well to a stochastic characterization. Interestingly, under some conditions triple sums are more common than double sums! The likelihood of getting a hit appears to be a fairly simple (monotone) function of both n and m though, no surprises there.

double sums; n,m = 4..20

triple sums; n,m = 4..20

answered Jan 16, 2014 at 12:56

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

3 Comments

marshall Over a year ago

Could you explain your pictures a little please? Also when you say "under some conditions triple sums are more common than double sums" that seems surprising. What sort of conditions?

Eelco Hoogendoorn Over a year ago

Somehow the image captions don't show up. Top image is double sums, bottom image is triple sums, with 10.000 samples per cell. vertical axis is n, horizontal axis is m, running from 4 to 20 inclusive. red is 100% hits, blue is 0% hits. At the bottom row (n=20), we may see the triple sums showing up more than the double sums; at least for moderate m. triple sums seem to fall off more shapely with bigger m too, so that may not hold true for all m.

Eelco Hoogendoorn Over a year ago

In the region around n=300,m=20, you are virtually sure to have a triple hit, and sure not to have a double hit; and this contrast can be pushed arbitrarily high. The converse scenario does not appear to manifest itself anywhere in the parameter space. Double hits are more common for n<=m, but they never dominate in the same way.

Collectives™ on Stack Overflow

Finding a better way to count matrices

3 Answers 3

6 Comments

6 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

6 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related