0

I am trying to build an adjacency matrix from pixels of an elevation raster. The raster is a GeoTIFF image with specified nodata values outside a watershed.

The solution I have now is working but is far from optimal.

What I tried:

  • Scikit-learn: sklearn.feature_extraction.image.img_to_graph works ok, but it only gets 4 way neighborhood (rook neighborhood) and I need it 8 way (queen neighborhood).
  • Scipy: the solution I have now. To iterate over a numpy 2d array and iterate over a 3x3 window, comparing the center with its neighbors, searching for non nodata pixels and feeding a scipy sparse matrix.

The problem is, I can run this for 2000x2000 images but I need to test it with a much larger image (more than 25000 pixels on each side).

There is any way to optimize this algorithm before I can try it in a more powerful computer?

Thanks in advance.

This is the code I have now:

def image_to_adjacency_matrix(image_array, nodata = None):
    image_flat = image_array.flatten()

    height, width = image_array.shape
    adjacency_matrix = scipy.sparse.lil_matrix((height * width, height * width), dtype=int)

    for i in range(height):
        for j in range(width):
            current_pixel = i * width + j

            if image_flat[current_pixel] == nodata:
                continue

            for ni in range(i - 1, i + 2):
                for nj in range(j - 1, j + 2):
                    if 0 <= ni < height and 0 <= nj < width:
                        neighbor_pixel = ni * width + nj

                        if image_flat[neighbor_pixel] == nodata:
                            continue

                        if current_pixel != neighbor_pixel:
                            adjacency_matrix[current_pixel, neighbor_pixel] = 1


    return adjacency_matrix.tocsr()
0

1 Answer 1

1

Made an attempt at vectorizing this using NumPy.

Note: I implemented a simplified version of your problem, which avoids wrapping around the edge of the raster by only creating edges for 1 to width - 1, rather than from 0 to width. This simplified the logic enough that I could solve the problem with NumPy indexing.

def image_to_adjacency_matrix_opt(image_array, nodata = None):
    image_flat = image_array.flatten()

    height, width = image_array.shape
    N = height * width
    image_has_data = image_flat != nodata
    index_dtype = np.int32 if N < 2 ** 31 else np.int64
    adjacents = np.array([
        -width - 1, -width, -width + 1,
        -1,                 1,
        width - 1,  width,  width + 1 
    ], dtype=index_dtype)
    row_idx, col_idx = np.meshgrid(
        np.arange(1, height - 1, dtype=index_dtype),
        np.arange(1, width - 1, dtype=index_dtype),
        indexing='ij'
    )
    row_idx = row_idx.reshape(-1)
    col_idx = col_idx.reshape(-1)
    pixel_idx = row_idx * width + col_idx
    pixels_with_data = image_has_data[pixel_idx]
    pixel_idx = pixel_idx[pixels_with_data]
    neighbors = pixel_idx.reshape(-1, 1) + adjacents.reshape(1, -1)
    neighbors_with_data = image_has_data[neighbors]
    row = np.repeat(pixel_idx, repeats=neighbors_with_data.sum(axis=1))
    col = neighbors[neighbors_with_data]
    data = np.ones(len(row), dtype='uint8')
    adjacency_matrix = scipy.sparse.coo_matrix((data, (row, col)), dtype=int, shape=(N, N))
    return adjacency_matrix.tocsr()

I found this was roughly 100x faster for a 500x500 matrix with 50% of its entries randomly set to nodata.

Example test dataset:

N = 500
image = np.random.randint(0, 100, size=(N, N))
image[np.random.rand(N, N) < 0.5] = -1
image = image.astype('int8')
Sign up to request clarification or add additional context in comments.

5 Comments

I'm eager to try your solution, although I have tried vectorising it before. I need to update the question with some more things that I tried. Other problem here is the memory too. A solution like these for a 25k width image will explode my RAM. As soon as I try your solution, I come back with the results. Thanks!!
@RikFerreira I took another look at my code under a memory profiler, and found that using smaller dtypes reduced memory usage by 36%. See edit.
Thanks! I tried your solution and it ran faster than scikit-image implementation! Thanks for the help! I guess it can't be faster using only these python frameworks. Do you (or anyone who might be reading this) know any other tool as a SGBD that can deal with this? I'm really new in this scale of graph handling.
@RikFerreira 1) What's SGBD? 2) The number one suggestion I would have to reduce memory usage is to avoid converting it to a graph and keep it as a raster. I assume you have a reason to want this transformation, but I can't think of what it might be.
1) Sorry, my fault. It is a portuguese translation of DBMS. 2) Yes, I am trying to avoid writing this routine in C or Rust, but I think that this task hits the Python language boundaries. Again, thank you for your help!!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.