3

For some econometric work.

I often need to derive multiple parallel arrays of calculated variables given a (potentially) large number of parallel data arrays.

In the following example, I have two input arrays and two output arrays, but imagine in the real world there could by anywhere from 5-10 input and output arrays.

w, x are inputs
y, z are outputs

Method A:

w = [1, -2, 5]
x = [0, 3, 2]
N = len(w)
I = range(N)
y = map(lambda i: w[i] + x[i], I)
z = map(lambda i: w[i] - x[i], I)

Method B:

w = [1, -2, 5]
x = [0, 3, 2]
N = len(w)
I = range(N)
y, z = [], []
for i in I:
  y.append(w[i] + x[i])
  z.append(w[i] - x[i])

Method C:

w = [1, -2, 5]
x = [0, 3, 2]
y, z = [], []
for w_i, x_i in zip(w, x):
  y.append(w_i + x_i)
  z.append(w_i - x_i)

Method D:

w = [1, -2, 5]
x = [0, 3, 2]
N = len(w)
I = range(N)
(y, z) = transpose(map(lambda i: [w[i] + x[i], w[i] - x[i]], I))

D seems to be the most concise, extendable, and efficient. But it's also the most difficult to read, especially with many variables with complicated formulae.

A is my favorite, with a little duplication, but is it less efficient to construct a loop per vairable? Will this not scale with large data?

B vs. C: I know C is more pythonic but B seems more convenient and concise, and scales better with more variables. In both cases, I hate the extra line where I have to declare the variables up-front.

Overall, I am not perfectly satisfied with any of the above approaches. Is there something missing from my reasoning or is there a better method out there?

9
  • 1
    Have you considered using numpy? Most scientific computing in Python is done in numpy. In numpy, this would just be y = w + x; z = w - x. Commented Jan 30, 2015 at 1:03
  • @senshin, why not y, z = w + x, w - x? Commented Jan 30, 2015 at 1:08
  • 1
    @PadraicCunningham I mean, does it matter? They both do the same thing. Commented Jan 30, 2015 at 1:09
  • @senshin, yes one is pythonic the other is not. Commented Jan 30, 2015 at 1:10
  • 1
    @PadraicCunningham Come on, you know I used a semicolon because you can't have newlines in comments. Commented Jan 30, 2015 at 1:11

2 Answers 2

2

use numpy ... that performs the operations in C++ so its much faster ... (especially if we assume your arrays are much bigger than 3 items)

w = numpy.array([1, -2, 5])
x = numpy.array([0, 3, 2])

y = w+x
z = w-x
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. This is a winner. So simple and elegant.
0

i think @Beasley's suggestion works well, and i suggest using multiprocessing on top of it so that the output generation is in parallel. your computation seems perfectly parallelizable!

what i can offer can't beat the tips discussed on here: Does python support multiprocessor/multicore programming?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.