0

I have a list d_n of n integers and the results Z_klm of a function fun(dk, dl, dm) for all binom(n, 3) combinations without repetition (k, l, m) out of d_n indices.

Now, for all binom(n, 2) combinations without repetitions (s, t) of d_n indices, I need to take the T_st partial sums of Z_klm where (s, t) is a subset of (k, l, m).

Here's an example with a small n.

import numpy as np
from itertools import combinations
from scipy.special import binom

# generate random data
n = 5
d_n = np.random.randint(-30, 30, 5)

# define the function
def fun(dk, dl, dm):
    return np.sign(dk - 2*dl + dm)

# calculate Z_klm and store (k,l,m) indices
klm = []
Z_klm = []
for k, l, m in combinations(range(n), 3):
    Z_klm.append(fun(d_n[k], d_n[l], d_n[m]))
    klm.append([k, l, m])

# calculate the partial sums T_st
st = {}
T_st = np.zeros(shape=int(binom(n, 2)))
for h, (s, t) in enumerate(combinations(range(n), 2)):
    st.update({f"({s},{t})": []})
    for i, _klm_ in enumerate(klm):
        if s in _klm_ and t in _klm_:
            T_st[h] += Z_klm[i]
            st[f"({s},{t})"].append(_klm_)

T_st, st
(array([ 1.,  1.,  2.,  2.,  1.,  1.,  1.,  1.,  1., -2.]),

 {'(0,1)': [[0, 1, 2], [0, 1, 3], [0, 1, 4]],
  '(0,2)': [[0, 1, 2], [0, 2, 3], [0, 2, 4]],
  '(0,3)': [[0, 1, 3], [0, 2, 3], [0, 3, 4]],
  '(0,4)': [[0, 1, 4], [0, 2, 4], [0, 3, 4]],
  '(1,2)': [[0, 1, 2], [1, 2, 3], [1, 2, 4]],
  '(1,3)': [[0, 1, 3], [1, 2, 3], [1, 3, 4]],
  '(1,4)': [[0, 1, 4], [1, 2, 4], [1, 3, 4]],
  '(2,3)': [[0, 2, 3], [1, 2, 3], [2, 3, 4]],
  '(2,4)': [[0, 2, 4], [1, 2, 4], [2, 3, 4]],
  '(3,4)': [[0, 3, 4], [1, 3, 4], [2, 3, 4]]})

For example T_{2,4} is the sum of Z_{0,2,4}, Z_{1,2,4}, and Z_{2,3,4}.

My rough implementation is working only because there are very few observations here. But in case of real sample sizes (can usually be up to n = 1000), it would take a lifetime to iterate all over the binom(n,2) and the binom(n,3).

Any suggestions to speed it up with an efficient algorithm or more advanced iteration tools?

P.S.: I do not need to store all (k, l, m) and (s, t) indices; I only did it here to demonstrate how it works and to implement the algorithm for calculating the T_st partial sums.

4
  • 2
    You are asking how to do this for any fun, correct? And are you only interested in the '2 out of 3' case or are you looking for a solution where the tuple-size might vary? Commented Oct 31 at 23:10
  • Hi @Grismar. Yes, for any fun taking 3 integer arguments and always '2 out of 3' case. Commented Nov 1 at 9:45
  • what is a "(s, t) is a subset of (k, l, m)"? For a single (k, l, m) you get the followings (k, l), (k, m), (l, m)? This will possibly generate repeated pairs Commented Nov 1 at 9:54
  • 1
    @cards I mean, for each (s,t) combination without repetition of n elements, where (k,l)=(s,t) or (k,m)=(s,t) or (l,m)=(s,t). I know it isn't a rigorous definition. If it could help, I'll post a LaTeX formula. Commented Nov 1 at 10:47

1 Answer 1

1

For large n (like 1000), you will want to avoid the nested loop structure. Here is my recommended code for better performance with n=1000:

def compute_partial_sums_optimal(d_n, fun):
    n = len(d_n)
    pairs_list = list(combinations(range(n), 2))
    n_pairs = len(pairs_list)
    
    T_st = np.zeros(n_pairs)
    
    # For each pair (s,t), sum over all third indices
    for i, (s, t) in enumerate(pairs_list):
        # All triples containing (s,t)
        for r in range(n):
            if r != s and r != t:
                k, l, m = sorted([s, t, r])
                T_st[i] += fun(d_n[k], d_n[l], d_n[m])
    
    return T_st

# Benchmark
import time

n = 100  # Start with n=100 for testing
d_n = np.random.randint(-30, 30, n)

start = time.time()
result = compute_partial_sums_optimal(d_n, fun)
print(f"Time for n={n}: {time.time() - start:.3f}s")
Sign up to request clarification or add additional context in comments.

2 Comments

Nice idea, thanks. I'd only avoid the pairs_list = list(combinations(range(n), 2)) cause it could be extremely large. We know n_pairs = int(binom(n, 2)) and the cycle could be for i, (s, t) in enumerate(combinations(range(n), 2)):. What do you think about it?
Yes, you can replace len(list(itertools.combinations(range(n), k))) with the much more efficient math.comb(n, k), and then iterate directly using iterators, for i,(s, t) in enumerate(combinations(range(n), 2)): and DO NOT call list( ) on it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.