0

I have a task:

How many pairs of (i,j): array_1[ i ] + array_1[ j ] > array_2[ i ] + array_2[ j ]

This is my code:

import  numpy as np 
import  pandas as pd

n = 200000

series_1 = np.random.randint(low = 1,high = 1000,size = n)
series_1_T = series_1.reshape(n,1)
series_2  = np.random.randint(low = 1,high = 1000,size = n)
series_2_T = series_2.reshape(n,1)

def differ(x):
    count = 0
    tabel_1 = series_1 + series_1_T[x:x+2000]
    tabel_2 = series_2 + series_2_T[x:x+2000]
    diff= tabel_1[tabel_1>tabel_2].shape[0]
    count += diff
    return count

arr = pd.DataFrame(data = np.arange(0,n,2000),columns = ["numbers"])

count_each_run = arr["numbers"].apply(differ) #this one take about 8min 40s

print(count_each_run.sum())

Are there any ways to speedup this?

1 Answer 1

1

If you don't run in memory error you can do:

n = 200_000

s1 = np.random.randint(low=1, high=1000, size=(n,1))
s2 = np.random.randint(low=1, high=1000, size=(n,1))

t1 = s1 + s1.T
t2 = s2 + s2.T

tot = np.sum(t1>t2)

Otherwise you can create batches, and again depending on what you can fit in memory you can use one or two for loops:

n = 200_000

s1 = np.random.randint(low=1, high=1000, size=(n,1))
s2 = np.random.randint(low=1, high=1000, size=(n,1))

bs = 10_000 # batchsize
tot = 0
for i in range(0, n, bs):
    for j in range(0, n, bs):

        t1 = s1[i:i+bs] + s1[j:j+bs].T
        t2 = s2[i:i+bs] + s2[j:j+bs].T

        tot += np.sum(t1>t2)

If you need speed you can try something like numba or cython.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.