Optimal way to build adjacency matrix from image

Question

I am trying to build an adjacency matrix from pixels of an elevation raster. The raster is a GeoTIFF image with specified nodata values outside a watershed.

The solution I have now is working but is far from optimal.

What I tried:

Scikit-learn: sklearn.feature_extraction.image.img_to_graph works ok, but it only gets 4 way neighborhood (rook neighborhood) and I need it 8 way (queen neighborhood).
Scipy: the solution I have now. To iterate over a numpy 2d array and iterate over a 3x3 window, comparing the center with its neighbors, searching for non nodata pixels and feeding a scipy sparse matrix.

The problem is, I can run this for 2000x2000 images but I need to test it with a much larger image (more than 25000 pixels on each side).

There is any way to optimize this algorithm before I can try it in a more powerful computer?

Thanks in advance.

This is the code I have now:

def image_to_adjacency_matrix(image_array, nodata = None):
    image_flat = image_array.flatten()

    height, width = image_array.shape
    adjacency_matrix = scipy.sparse.lil_matrix((height * width, height * width), dtype=int)

    for i in range(height):
        for j in range(width):
            current_pixel = i * width + j

            if image_flat[current_pixel] == nodata:
                continue

            for ni in range(i - 1, i + 2):
                for nj in range(j - 1, j + 2):
                    if 0 <= ni < height and 0 <= nj < width:
                        neighbor_pixel = ni * width + nj

                        if image_flat[neighbor_pixel] == nodata:
                            continue

                        if current_pixel != neighbor_pixel:
                            adjacency_matrix[current_pixel, neighbor_pixel] = 1


    return adjacency_matrix.tocsr()

Nick ODell · Accepted Answer · 2023-11-20 17:52:21Z

1

Made an attempt at vectorizing this using NumPy.

Note: I implemented a simplified version of your problem, which avoids wrapping around the edge of the raster by only creating edges for 1 to width - 1, rather than from 0 to width. This simplified the logic enough that I could solve the problem with NumPy indexing.

def image_to_adjacency_matrix_opt(image_array, nodata = None):
    image_flat = image_array.flatten()

    height, width = image_array.shape
    N = height * width
    image_has_data = image_flat != nodata
    index_dtype = np.int32 if N < 2 ** 31 else np.int64
    adjacents = np.array([
        -width - 1, -width, -width + 1,
        -1,                 1,
        width - 1,  width,  width + 1 
    ], dtype=index_dtype)
    row_idx, col_idx = np.meshgrid(
        np.arange(1, height - 1, dtype=index_dtype),
        np.arange(1, width - 1, dtype=index_dtype),
        indexing='ij'
    )
    row_idx = row_idx.reshape(-1)
    col_idx = col_idx.reshape(-1)
    pixel_idx = row_idx * width + col_idx
    pixels_with_data = image_has_data[pixel_idx]
    pixel_idx = pixel_idx[pixels_with_data]
    neighbors = pixel_idx.reshape(-1, 1) + adjacents.reshape(1, -1)
    neighbors_with_data = image_has_data[neighbors]
    row = np.repeat(pixel_idx, repeats=neighbors_with_data.sum(axis=1))
    col = neighbors[neighbors_with_data]
    data = np.ones(len(row), dtype='uint8')
    adjacency_matrix = scipy.sparse.coo_matrix((data, (row, col)), dtype=int, shape=(N, N))
    return adjacency_matrix.tocsr()

I found this was roughly 100x faster for a 500x500 matrix with 50% of its entries randomly set to nodata.

Example test dataset:

N = 500
image = np.random.randint(0, 100, size=(N, N))
image[np.random.rand(N, N) < 0.5] = -1
image = image.astype('int8')

edited Nov 20, 2023 at 17:52

answered Nov 19, 2023 at 22:43

Nick ODell

28.1k7 gold badges52 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Rik Ferreira Over a year ago

I'm eager to try your solution, although I have tried vectorising it before. I need to update the question with some more things that I tried. Other problem here is the memory too. A solution like these for a 25k width image will explode my RAM. As soon as I try your solution, I come back with the results. Thanks!!

Nick ODell Over a year ago

@RikFerreira I took another look at my code under a memory profiler, and found that using smaller dtypes reduced memory usage by 36%. See edit.

Rik Ferreira Over a year ago

Thanks! I tried your solution and it ran faster than scikit-image implementation! Thanks for the help! I guess it can't be faster using only these python frameworks. Do you (or anyone who might be reading this) know any other tool as a SGBD that can deal with this? I'm really new in this scale of graph handling.

Nick ODell Over a year ago

@RikFerreira 1) What's SGBD? 2) The number one suggestion I would have to reduce memory usage is to avoid converting it to a graph and keep it as a raster. I assume you have a reason to want this transformation, but I can't think of what it might be.

Rik Ferreira Over a year ago

1) Sorry, my fault. It is a portuguese translation of DBMS. 2) Yes, I am trying to avoid writing this routine in C or Rust, but I think that this task hits the Python language boundaries. Again, thank you for your help!!!

Collectives™ on Stack Overflow

Optimal way to build adjacency matrix from image

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related