1

I have a segmentation map (numpy.ndarray) that contain objects labeled with unique numbers. I want to combine objects across multiple slices by labeling them with the same number. Specifically, I want to renumber objects based on a DataFrame containing centroid positions and the desired label value.

First, I created some mock labels and a DataFrame:

df = pd.DataFrame({
    "slice": [0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
    "number": [1, 2, 3, 4, 1, 2, 3, 1, 2, 3],
    "x": [10, 20, 30, 40, 11, 21, 31, 12, 22, 32],
    "y": [10, 20, 30, 40, 11, 21, 31, 12, 22, 32]
})

def make_segmap(df):
    x, y = np.indices((50, 50))
    maps = []

    # Iterate over slices and coordinates
    for n_slice in df["slice"].unique():
        masks = []
        for row in df[df["slice"] == n_slice].iterrows():
            # Create circle
            mask_circle = (x - row[1]["x"])**2 + (y - row[1]["y"])**2 < 5**2
            # Random index number (here just a multiple)
            masks.append(mask_circle * row[1]["number"]*3)
        maps.append(np.max(masks, axis=0))
    return np.stack(maps, axis=0)

segmap = make_segmap(df)

For renumbering, this is what I came up with so far:

new_maps = []

# Iterate over slices
for n_slice in df["slice"].unique():
    new_labels = []
    for row in df[df["slice"] == n_slice].iterrows():
        # Find current value at position
        original_label = segmap[n_slice, row[1]["y"], row[1]["x"]]
        # Replace all label occurrences with the desired label from the DataFrame
        replaced_label = np.where(segmap[n_slice] == original_label, row[1]["number"], 0)
        new_labels.append(replaced_label)
    new_maps.append(np.max(new_labels, axis=0))

new_segmap = np.stack(new_maps, axis=0)

This works reasonably well but doesn't scale to larger datasets. The real dataset has thousands of objects across hundreds of slices and this approach takes very long to run (an hour or so). Are there any suggestions on how to replace multiple values at once to improve performance?

Thanks in advance.

1 Answer 1

1

You can use groupby to replace the current quadratic search algorithm by a (quasi) linear search. Moreover, you can take advantage of Numpy's vectorization and broadcasting to remove the inner loop and make the computation faster.

Here is a faster implementation:

def make_segmap_fast(df):
    x, y = np.indices((50, 50))
    maps = []

    # Iterate over slices and coordinates
    for n_slice,subDf in df.groupby("slice"):
        subDf_x = subDf["x"].to_numpy()[:, None, None]
        subDf_y = subDf["y"].to_numpy()[:, None, None]
        subDf_number = subDf["number"].to_numpy()[:, None, None]
        # Create circle
        mask_circle = (x - subDf_x)**2 + (y - subDf_y)**2 < 5**2
        # Random index number (here just a multiple)
        masks = mask_circle * subDf_number
        maps.append(np.max(masks, axis=0)*3)
    return np.stack(maps, axis=0)

On my machine, this is 2 times faster on the very small example (much more on bigger dataframes).

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. This does work faster. Unfortunately, the question still remains on how to improve the relabelling (second part) which is what I'm struggling with.
I think you can use exactly the same approach: grouby + vectorization + broadcasting. Actually the grouby only should already help a lot on big dataframes (and is trivial to apply here).
True. I didn't realise you could broadcast / vectorise like this before. Replacing everything with subDf_x / y / number works perfectly. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.