Renumber/Relabel a Numpy array based on coordinates

Question

I have a segmentation map (numpy.ndarray) that contain objects labeled with unique numbers. I want to combine objects across multiple slices by labeling them with the same number. Specifically, I want to renumber objects based on a DataFrame containing centroid positions and the desired label value.

First, I created some mock labels and a DataFrame:

df = pd.DataFrame({
    "slice": [0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
    "number": [1, 2, 3, 4, 1, 2, 3, 1, 2, 3],
    "x": [10, 20, 30, 40, 11, 21, 31, 12, 22, 32],
    "y": [10, 20, 30, 40, 11, 21, 31, 12, 22, 32]
})

def make_segmap(df):
    x, y = np.indices((50, 50))
    maps = []

    # Iterate over slices and coordinates
    for n_slice in df["slice"].unique():
        masks = []
        for row in df[df["slice"] == n_slice].iterrows():
            # Create circle
            mask_circle = (x - row[1]["x"])**2 + (y - row[1]["y"])**2 < 5**2
            # Random index number (here just a multiple)
            masks.append(mask_circle * row[1]["number"]*3)
        maps.append(np.max(masks, axis=0))
    return np.stack(maps, axis=0)

segmap = make_segmap(df)

For renumbering, this is what I came up with so far:

new_maps = []

# Iterate over slices
for n_slice in df["slice"].unique():
    new_labels = []
    for row in df[df["slice"] == n_slice].iterrows():
        # Find current value at position
        original_label = segmap[n_slice, row[1]["y"], row[1]["x"]]
        # Replace all label occurrences with the desired label from the DataFrame
        replaced_label = np.where(segmap[n_slice] == original_label, row[1]["number"], 0)
        new_labels.append(replaced_label)
    new_maps.append(np.max(new_labels, axis=0))

new_segmap = np.stack(new_maps, axis=0)

This works reasonably well but doesn't scale to larger datasets. The real dataset has thousands of objects across hundreds of slices and this approach takes very long to run (an hour or so). Are there any suggestions on how to replace multiple values at once to improve performance?

Thanks in advance.

Jérôme Richard · Accepted Answer · 2021-05-03 19:03:29Z

1

You can use groupby to replace the current quadratic search algorithm by a (quasi) linear search. Moreover, you can take advantage of Numpy's vectorization and broadcasting to remove the inner loop and make the computation faster.

Here is a faster implementation:

def make_segmap_fast(df):
    x, y = np.indices((50, 50))
    maps = []

    # Iterate over slices and coordinates
    for n_slice,subDf in df.groupby("slice"):
        subDf_x = subDf["x"].to_numpy()[:, None, None]
        subDf_y = subDf["y"].to_numpy()[:, None, None]
        subDf_number = subDf["number"].to_numpy()[:, None, None]
        # Create circle
        mask_circle = (x - subDf_x)**2 + (y - subDf_y)**2 < 5**2
        # Random index number (here just a multiple)
        masks = mask_circle * subDf_number
        maps.append(np.max(masks, axis=0)*3)
    return np.stack(maps, axis=0)

On my machine, this is 2 times faster on the very small example (much more on bigger dataframes).

answered May 3, 2021 at 19:03

Jérôme Richard

53.4k6 gold badges48 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

BBQuercus Over a year ago

Thanks. This does work faster. Unfortunately, the question still remains on how to improve the relabelling (second part) which is what I'm struggling with.

Jérôme Richard Over a year ago

I think you can use exactly the same approach: grouby + vectorization + broadcasting. Actually the grouby only should already help a lot on big dataframes (and is trivial to apply here).

BBQuercus Over a year ago

True. I didn't realise you could broadcast / vectorise like this before. Replacing everything with subDf_x / y / number works perfectly. Thanks!

Collectives™ on Stack Overflow

Renumber/Relabel a Numpy array based on coordinates

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related