1

I have a large data set and want to plot the entire set as a background and then highlight filtered features in it by subsetting and plotting on top of the background. I have this working by replotting the background each time, but this is very time consuming since I render about 40 plots based on this.

The issue I am having is that I cannot seem get the background data (first scatter plot) to stay in place. either by copying the figure or trying to copy the axis.

An example fully functional code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


df = pd.DataFrame(
    {
        "x": np.random.normal(size=100),
        "y": np.random.rand(100),
        "thing_1": np.concatenate((np.ones(50), np.zeros(50))),
        "thing_2": np.concatenate((np.zeros(50), np.ones(50)))}
)

fig, ax = plt.subplots(figsize=(12, 8))


# This works but replots the background data each time (costly with the large datasets)
for thing in ['thing_1', 'thing_2']:

    ax.clear()
    # background data cloud  Reuse instead of plotting
    ax.scatter(df.x, df.y, c='grey', alpha=0.5, s=30)

    # subset to highlight
    ind = df[thing] == 1
    ax.scatter(df.loc[ind, 'x'], df.loc[ind, 'y'], c='red', alpha=1, s=15)

    plt.savefig('{}_filter.png'.format(thing))

My current best attempt optimizing the code:

# Want to do something like this (only plot background data once and copy the axis or figure)
fig_background, ax_background = plt.subplots(figsize=(12, 8))
ax_background.scatter(df.x, df.y, c='grey', alpha=0.5, s=30)

for thing in ['thing_1', 'thing_2']:
    fig_filter = fig_background

    axs = fig_filter.get_axes()

    # subset to highlight
    ind = df[thing] == 1
    axs[0].scatter(df.loc[ind, 'x'], df.loc[ind, 'y'], c='red', alpha=1, s=15)

    plt.savefig('{}_filter.png'.format(thing))

    plt.cla()

1 Answer 1

3

You may remove the scatter in each loop step before plotting a new one.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


df = pd.DataFrame(
    {
        "x": np.random.normal(size=100),
        "y": np.random.rand(100),
        "thing_1": np.concatenate((np.ones(50), np.zeros(50))),
        "thing_2": np.concatenate((np.zeros(50), np.ones(50)))}
)

fig, ax = plt.subplots(figsize=(12, 8))
# background data cloud
ax.scatter(df.x, df.y, c='grey', alpha=0.5, s=30)

scatter = None

for thing in ['thing_1', 'thing_2']:

    if scatter is not None:
        scatter.remove()

    # subset to highlight
    ind = df[thing] == 1
    scatter = ax.scatter(df.loc[ind, 'x'], df.loc[ind, 'y'], c='red', 
    alpha=1, s=15)

    plt.savefig('{}_filter.png'.format(thing))  
Sign up to request clarification or add additional context in comments.

4 Comments

I got some errors when running your code. I have suggested some edits that seem to make it work as desired but I am not sure if it is good practice. Please look it over and let me know!
First off scatter = [] with the empty list would enter the scatter is not None. Second after assigning scatter[0] I would get an error scatter[0] index out of range with the scatter[0].remove() method.
Ok, looks good. It should have been if not scatter, but the solution you provided works well, so keep it if you like.
Thanks and thank you for your help! Much appreciated and will literally save me hours of plotting time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.