Efficiently generate a lattice of points in python

Question

Help make my code faster: My python code needs to generate a 2D lattice of points that fall inside a bounding rectangle. I kludged together some code (shown below) that generates this lattice. However, this function is called many many times and has become a serious bottleneck in my application.

I'm sure there's a faster way to do this, probably involving numpy arrays instead of lists. Any suggestions for a faster, more elegant way to do this?

Description of the function: I have two 2D vectors, v1 and v2. These vectors define a lattice. In my case, my vectors define a lattice that is almost, but not quite, hexagonal. I want to generate the set of all 2D points on this lattice that are in some bounding rectangle. In my case, one of the corners of the rectangle is at (0, 0), and the other corners are at positive coordinates.

Example: If the far corner of my bounding rectangle was at (3, 3), and my lattice vectors were:

v1 = (1.2, 0.1)
v2 = (0.2, 1.1)

I'd want my function to return the points:

(1.2, 0.1) #v1
(2.4, 0.2) #2*v1
(0.2, 1.1) #v2
(0.4, 2.2) #2*v2
(1.4, 1.2) #v1 + v2
(2.6, 1.3) #2*v1 + v2
(1.6, 2.3) #v1 + 2*v2
(2.8, 2.4) #2*v1 + 2*v2

I'm not concerned with edge cases; it's doesn't matter if the function returns (0, 0), for example.

The slow way I'm currently doing it:

import numpy, pylab

def generate_lattice( #Help me speed up this function, please!
    image_shape, lattice_vectors, center_pix='image', edge_buffer=2):

    ##Preprocessing. Not much of a bottleneck:
    if center_pix == 'image':
        center_pix = numpy.array(image_shape) // 2
    else: ##Express the center pixel in terms of the lattice vectors
        center_pix = numpy.array(center_pix) - (numpy.array(image_shape) // 2)
        lattice_components = numpy.linalg.solve(
            numpy.vstack(lattice_vectors[:2]).T,
            center_pix)
        lattice_components -= lattice_components // 1
        center_pix = (lattice_vectors[0] * lattice_components[0] +
                      lattice_vectors[1] * lattice_components[1] +
                      numpy.array(image_shape)//2)
    num_vectors = int( ##Estimate how many lattice points we need
        max(image_shape) / numpy.sqrt(lattice_vectors[0]**2).sum())
    lattice_points = []
    lower_bounds = numpy.array((edge_buffer, edge_buffer))
    upper_bounds = numpy.array(image_shape) - edge_buffer

    ##SLOW LOOP HERE. 'num_vectors' is often quite large.
    for i in range(-num_vectors, num_vectors):
        for j in range(-num_vectors, num_vectors):
            lp = i * lattice_vectors[0] + j * lattice_vectors[1] + center_pix
            if all(lower_bounds < lp) and all(lp < upper_bounds):
                lattice_points.append(lp)
    return lattice_points


##Test the function and display the output.
##No optimization needed past this point.
lattice_vectors = [
    numpy.array([-40., -1.]),
    numpy.array([ 18., -37.])]
image_shape = (1000, 1000)
spots = generate_lattice(image_shape, lattice_vectors)

fig=pylab.figure()
pylab.plot([p[1] for p in spots], [p[0] for p in spots], '.')
pylab.axis('equal')
fig.show()

Would it be better for you to do lattice_components = numpy.modf(lattice_components)[0]? (Not asking in regards to your question, but just out of curiosity. Would it be significantly faster/slower?) — JAB
– JAB, Commented May 26, 2011 at 17:02
Didn't know about modf, good suggestion. I think most of the time spent in this function is inside the double-nested for loop, but I'll benchmark to make sure. — Andrew
– Andrew, Commented May 26, 2011 at 17:18

kiyo · Accepted Answer · 2011-05-26 21:06:24Z

If you want to vectorize the whole thing, generate a square lattice and then shear it. Then chop off the edges that land outside your box.

Here is what I came up with. There are still a lot of improvements that could be made, but this is the basic idea.

def generate_lattice(image_shape, lattice_vectors) :
    center_pix = numpy.array(image_shape) // 2
    # Get the lower limit on the cell size.
    dx_cell = max(abs(lattice_vectors[0][0]), abs(lattice_vectors[1][0]))
    dy_cell = max(abs(lattice_vectors[0][1]), abs(lattice_vectors[1][1]))
    # Get an over estimate of how many cells across and up.
    nx = image_shape[0]//dx_cell
    ny = image_shape[1]//dy_cell
    # Generate a square lattice, with too many points.
    # Here I generate a factor of 4 more points than I need, which ensures 
    # coverage for highly sheared lattices.  If your lattice is not highly
    # sheared, than you can generate fewer points.
    x_sq = np.arange(-nx, nx, dtype=float)
    y_sq = np.arange(-ny, nx, dtype=float)
    x_sq.shape = x_sq.shape + (1,)
    y_sq.shape = (1,) + y_sq.shape
    # Now shear the whole thing using the lattice vectors
    x_lattice = lattice_vectors[0][0]*x_sq + lattice_vectors[1][0]*y_sq
    y_lattice = lattice_vectors[0][1]*x_sq + lattice_vectors[1][1]*y_sq
    # Trim to fit in box.
    mask = ((x_lattice < image_shape[0]/2.0)
             & (x_lattice > -image_shape[0]/2.0))
    mask = mask & ((y_lattice < image_shape[1]/2.0)
                    & (y_lattice > -image_shape[1]/2.0))
    x_lattice = x_lattice[mask]
    y_lattice = y_lattice[mask]
    # Translate to the centre pix.
    x_lattice += center_pix[0]
    y_lattice += center_pix[1]
    # Make output compatible with original version.
    out = np.empty((len(x_lattice), 2), dtype=float)
    out[:, 0] = y_lattice
    out[:, 1] = x_lattice
    return out

Both this code and Pavan's give the kind of speedup I'm looking for. I'm marking this one correct, since Pavan describes it as more general. Thanks, everyone who helped!

Community · Accepted Answer · 2017-05-23 11:43:37Z

6

Since lower_bounds and upper_bounds are only 2-element arrays, numpy might not be the right choice here. Try to replace

if all(lower_bounds < lp) and all(lp < upper_bounds):

with basic Python stuff:

if lower1 < lp and lower2 < lp and lp < upper1 and lp < upper2:

According to timeit, the second approach is much faster:

>>> timeit.timeit('all(lower < lp)', 'import numpy\nlp=4\nlower = numpy.array((1,5))') 
3.7948939800262451
>>> timeit.timeit('lower1 < 4 and lower2 < 4', 'lp = 4\nlower1, lower2 = 1,5') 
0.074192047119140625

From my experience, as long as you don't need to handle n-diminsional data and if you don't need double precision floats, it is generally faster to use basic Python data types and constructs instead of numpy, which is a bit overload in such cases -- have a look at this other question.

Another minor improvement could be to compute range(-num_vectors, num_vectors) only once and then reuse it. Additionally you might want to use a product iterator instead of a nested for loop -- though I don't think that these changes will have a significant influence on performance.

edited May 23, 2017 at 11:43

CommunityBot

11 silver badge

answered May 26, 2011 at 20:33

Oben Sonne

9,9732 gold badges42 silver badges61 bronze badges

2 Comments

Andrew Over a year ago

Switching the inequalities to basic Python datatypes and changing the range sped things up, but less than a factor of 2, as computed by

timeit.timeit(     'generate_lattice(image_shape, lattice_vectors)',     'from test import generate_lattice, lattice_vectors, image_shape',     number=10)

. I guess a lot of the burden is in the loop itself?

Oben Sonne Over a year ago

@Andrew: I reproduced your example in Python 2.6 (Ubuntu 10.10) and the basic Python approach was approx. 2.3 times faster. I guess for further speed up you'll need a change on the algorithm level, as partly suggested by other answers.

Pavan Yalamanchili · Accepted Answer · 2011-05-26 20:36:02Z

3

May be you can replace the two for loops with this.

i,j = numpy.mgrid[-num_vectors:num_vectors, -num_vectors:num_vectors]
numel = num_vectors ** 2;
i = i.reshape(numel, 1)
j = j.reshape(numel, 1)
lp = i * lattice_vectors[0] + j * lattice_vectors[1] + center_pix
valid = numpy.all(lower_bounds < lp, 1) and numpy.all(lp < upper_bounds, 1)
lattice_points = lp[valid]

There may be some minor mistakes, but you get the idea..

EDIT

I made an edit to the "numpy.all(lower_bounds..)" to account for the correct dimension.

answered May 26, 2011 at 20:36

Pavan Yalamanchili

12.1k2 gold badges38 silver badges56 bronze badges

2 Comments

Andrew Over a year ago

This is pretty fast, >10x speedup. The valid line needs a slight correction (I replaced and with * to make it work). I'll double-check the output to make sure it's correct.

Pavan Yalamanchili Over a year ago

You might want to check Kiyo's code too. It is a more generalized version of what you should be doing: Vectorizing your code.

Lucas Wiman · Accepted Answer · 2011-05-27 06:19:15Z

I got a more than 2x speedup by replacing your computation of lp with repeated additions rather than multiplications. The xrange optimization seems to be inconsequential (though it probably doesn't hurt); repeated additions seem to be more efficient than the multiplications. Combining this with the other optimizations mentioned above should give you more speedup. But of course the best you can get is a speedup by a constant factor, since the size of your output is quadratic, as is your original code.

lv0, lv1 = lattice_vectors[0], lattice_vectors[1]
lv0_i = -num_vectors * lv0 + center_pix
lv1_orig = -num_vectors * lv1
lv1_j = lv1_orig
for i in xrange(-num_vectors, num_vectors):
    for j in xrange(-num_vectors, num_vectors):
        lp = lv0_i + lv1_j
        if all(lower_bounds < lp) and all(lp < upper_bounds):
            lattice_points.append(lp)
        lv1_j += lv1
    lv0_i += lv0
    lv1_j = lv1_orig

Timer results:

>>> t = Timer("generate_lattice(image_shape, lattice_vectors, orig=True)", "from __main__ import generate_lattice, lattice_vectors, image_shape")
>>> print t.timeit(number=50)
5.20121788979
>>> t = Timer("generate_lattice(image_shape, lattice_vectors, orig=False)", "from __main__ import generate_lattice, lattice_vectors, image_shape")
>>> print t.timeit(number=50)
2.17463898659

Collectives™ on Stack Overflow

Efficiently generate a lattice of points in python

4 Answers 4

1 Comment

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related