Skip to main content
Notice removed Draw attention by CommunityBot
Bounty Ended with Erik Eidt's answer chosen by CommunityBot
Notice added Draw attention by caveman
Bounty Started worth 100 reputation by caveman
added 82 characters in body
Source Link
caveman
  • 73
  • 10

I define an algorithm to be best if it minimizes the total run time over commodity hardware (e.g. normal desktop and server PCs).

I have sets A and B. I also have function f that works like f(a, b, n), where a is some item in A, b is some item in B, and n is some natural number. The output that this f(a, b, n) returns is a positive real number.

These properties are true:

  • f(a,b,n) has a randomized component in it. So invoking the function twice will likely give you two different answers each time (albeit close to each other, but nonetheless different). Larger values of n will only reduce the estimation error of the function $f$.
  • f(a,b,n) is twice slower as f(a,b,n/2).
  • f(a,b,n) = (f(a,b,n/2) + f(a,b,n/2))/2
  • Computing f(a_1,b_1,n) is independent from f(a_2,b_2,n), where a_1,a_2 are distinct elements from A, and b_1,b_2 are also distinct elements from B.

My goal is to compute the following the value of answer as follows:

count = 0
output = 0

for a in A:
    for b in B:
        output += f(a,b,n)
        count += 1

answer = output / count

return answer

But the code above is highly serialized. What I wish to do is to parallelize that such that I minimize the total number of seconds needed for me to obtain answer answer.

Just to re-iterate: I am running this application on a single computer, and this is why I am considering a multi-threaded application. I do not wish to distribute it over multiple machines. All I want is to just run it really fast over the many cores that I have on a single computer.

I define an algorithm to be best if it minimizes the total run time over commodity hardware (e.g. normal desktop and server PCs).

I have sets A and B. I also have function f that works like f(a, b, n), where a is some item in A, b is some item in B, and n is some natural number. The output that this f(a, b, n) returns is a positive real number.

These properties are true:

  • f(a,b,n) has a randomized component in it. So invoking the function twice will likely give you two different answers each time (albeit close to each other, but nonetheless different).
  • f(a,b,n) is twice slower as f(a,b,n/2).
  • f(a,b,n) = (f(a,b,n/2) + f(a,b,n/2))/2
  • Computing f(a_1,b_1,n) is independent from f(a_2,b_2,n), where a_1,a_2 are distinct elements from A, and b_1,b_2 are also distinct elements from B.

My goal is to compute the following the value of answer as follows:

count = 0
output = 0

for a in A:
    for b in B:
        output += f(a,b,n)
        count += 1

answer = output / count

return answer

But the code above is highly serialized. What I wish to do is to parallelize that such that I minimize the total number of seconds needed for me to obtain answer answer.

Just to re-iterate: I am running this application on a single computer, and this is why I am considering a multi-threaded application. I do not wish to distribute it over multiple machines. All I want is to just run it really fast over the many cores that I have on a single computer.

I define an algorithm to be best if it minimizes the total run time over commodity hardware (e.g. normal desktop and server PCs).

I have sets A and B. I also have function f that works like f(a, b, n), where a is some item in A, b is some item in B, and n is some natural number. The output that this f(a, b, n) returns is a positive real number.

These properties are true:

  • f(a,b,n) has a randomized component in it. So invoking the function twice will likely give you two different answers each time (albeit close to each other, but nonetheless different). Larger values of n will only reduce the estimation error of the function $f$.
  • f(a,b,n) is twice slower as f(a,b,n/2).
  • f(a,b,n) = (f(a,b,n/2) + f(a,b,n/2))/2
  • Computing f(a_1,b_1,n) is independent from f(a_2,b_2,n), where a_1,a_2 are distinct elements from A, and b_1,b_2 are also distinct elements from B.

My goal is to compute the following the value of answer as follows:

count = 0
output = 0

for a in A:
    for b in B:
        output += f(a,b,n)
        count += 1

answer = output / count

return answer

But the code above is highly serialized. What I wish to do is to parallelize that such that I minimize the total number of seconds needed for me to obtain answer answer.

Just to re-iterate: I am running this application on a single computer, and this is why I am considering a multi-threaded application. I do not wish to distribute it over multiple machines. All I want is to just run it really fast over the many cores that I have on a single computer.

added 192 characters in body
Source Link
caveman
  • 73
  • 10

I define an algorithm to be best if it minimizes the total run time over commodity hardware (e.g. normal desktop and server PCs).

I have sets A and B. I also have function f that works like f(a, b, n), where a is some item in A, b is some item in B, and n is some natural number. The output that this f(a, b, n) returns is a positive real number.

These properties are true:

  • f(a,b,n) has a randomized component in it. So invoking the function twice will likely give you two different answers each time (albeit close to each other, but nonetheless different).
  • f(a,b,n) is twice slower as f(a,b,n/2).
  • f(a,b,n) = (f(a,b,n/2) + f(a,b,n/2))/2
  • Computing f(a_1,b_1,n) is independent from f(a_2,b_2,n), where a_1,a_2 are distinct elements from A, and b_1,b_2 are also distinct elements from B.

My goal is to compute the following the value of answer as follows:

count = 0
output = 0

for a in A:
    for b in B:
        output += f(a,b,n)
        count += 1

answer = output / count

return answer

But the code above is highly serialized. What I wish to do is to parallelize that such that I minimize the total number of seconds needed for me to obtain answer answer.

Just to re-iterate: I am running this application on a single computer, and this is why I am considering a multi-threaded application. I do not wish to distribute it over multiple machines. All I want is to just run it really fast over the many cores that I have on a single computer.

I define an algorithm to be best if it minimizes the total run time over commodity hardware (e.g. normal desktop and server PCs).

I have sets A and B. I also have function f that works like f(a, b, n), where a is some item in A, b is some item in B, and n is some natural number. The output that this f(a, b, n) returns is a positive real number.

These properties are true:

  • f(a,b,n) is twice slower as f(a,b,n/2).
  • f(a,b,n) = (f(a,b,n/2) + f(a,b,n/2))/2
  • Computing f(a_1,b_1,n) is independent from f(a_2,b_2,n), where a_1,a_2 are distinct elements from A, and b_1,b_2 are also distinct elements from B.

My goal is to compute the following the value of answer as follows:

count = 0
output = 0

for a in A:
    for b in B:
        output += f(a,b,n)
        count += 1

answer = output / count

return answer

But the code above is highly serialized. What I wish to do is to parallelize that such that I minimize the total number of seconds needed for me to obtain answer answer.

Just to re-iterate: I am running this application on a single computer, and this is why I am considering a multi-threaded application. I do not wish to distribute it over multiple machines. All I want is to just run it really fast over the many cores that I have on a single computer.

I define an algorithm to be best if it minimizes the total run time over commodity hardware (e.g. normal desktop and server PCs).

I have sets A and B. I also have function f that works like f(a, b, n), where a is some item in A, b is some item in B, and n is some natural number. The output that this f(a, b, n) returns is a positive real number.

These properties are true:

  • f(a,b,n) has a randomized component in it. So invoking the function twice will likely give you two different answers each time (albeit close to each other, but nonetheless different).
  • f(a,b,n) is twice slower as f(a,b,n/2).
  • f(a,b,n) = (f(a,b,n/2) + f(a,b,n/2))/2
  • Computing f(a_1,b_1,n) is independent from f(a_2,b_2,n), where a_1,a_2 are distinct elements from A, and b_1,b_2 are also distinct elements from B.

My goal is to compute the following the value of answer as follows:

count = 0
output = 0

for a in A:
    for b in B:
        output += f(a,b,n)
        count += 1

answer = output / count

return answer

But the code above is highly serialized. What I wish to do is to parallelize that such that I minimize the total number of seconds needed for me to obtain answer answer.

Just to re-iterate: I am running this application on a single computer, and this is why I am considering a multi-threaded application. I do not wish to distribute it over multiple machines. All I want is to just run it really fast over the many cores that I have on a single computer.

Source Link
caveman
  • 73
  • 10

Best algorithm to multi-thread this application?

I define an algorithm to be best if it minimizes the total run time over commodity hardware (e.g. normal desktop and server PCs).

I have sets A and B. I also have function f that works like f(a, b, n), where a is some item in A, b is some item in B, and n is some natural number. The output that this f(a, b, n) returns is a positive real number.

These properties are true:

  • f(a,b,n) is twice slower as f(a,b,n/2).
  • f(a,b,n) = (f(a,b,n/2) + f(a,b,n/2))/2
  • Computing f(a_1,b_1,n) is independent from f(a_2,b_2,n), where a_1,a_2 are distinct elements from A, and b_1,b_2 are also distinct elements from B.

My goal is to compute the following the value of answer as follows:

count = 0
output = 0

for a in A:
    for b in B:
        output += f(a,b,n)
        count += 1

answer = output / count

return answer

But the code above is highly serialized. What I wish to do is to parallelize that such that I minimize the total number of seconds needed for me to obtain answer answer.

Just to re-iterate: I am running this application on a single computer, and this is why I am considering a multi-threaded application. I do not wish to distribute it over multiple machines. All I want is to just run it really fast over the many cores that I have on a single computer.