Reuse Base Plot Without Replotting

Question

I have a large data set and want to plot the entire set as a background and then highlight filtered features in it by subsetting and plotting on top of the background. I have this working by replotting the background each time, but this is very time consuming since I render about 40 plots based on this.

The issue I am having is that I cannot seem get the background data (first scatter plot) to stay in place. either by copying the figure or trying to copy the axis.

An example fully functional code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


df = pd.DataFrame(
    {
        "x": np.random.normal(size=100),
        "y": np.random.rand(100),
        "thing_1": np.concatenate((np.ones(50), np.zeros(50))),
        "thing_2": np.concatenate((np.zeros(50), np.ones(50)))}
)

fig, ax = plt.subplots(figsize=(12, 8))


# This works but replots the background data each time (costly with the large datasets)
for thing in ['thing_1', 'thing_2']:

    ax.clear()
    # background data cloud  Reuse instead of plotting
    ax.scatter(df.x, df.y, c='grey', alpha=0.5, s=30)

    # subset to highlight
    ind = df[thing] == 1
    ax.scatter(df.loc[ind, 'x'], df.loc[ind, 'y'], c='red', alpha=1, s=15)

    plt.savefig('{}_filter.png'.format(thing))

My current best attempt optimizing the code:

# Want to do something like this (only plot background data once and copy the axis or figure)
fig_background, ax_background = plt.subplots(figsize=(12, 8))
ax_background.scatter(df.x, df.y, c='grey', alpha=0.5, s=30)

for thing in ['thing_1', 'thing_2']:
    fig_filter = fig_background

    axs = fig_filter.get_axes()

    # subset to highlight
    ind = df[thing] == 1
    axs[0].scatter(df.loc[ind, 'x'], df.loc[ind, 'y'], c='red', alpha=1, s=15)

    plt.savefig('{}_filter.png'.format(thing))

    plt.cla()

ak_slick · Accepted Answer · 2018-07-20 13:18:37Z

3

You may remove the scatter in each loop step before plotting a new one.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


df = pd.DataFrame(
    {
        "x": np.random.normal(size=100),
        "y": np.random.rand(100),
        "thing_1": np.concatenate((np.ones(50), np.zeros(50))),
        "thing_2": np.concatenate((np.zeros(50), np.ones(50)))}
)

fig, ax = plt.subplots(figsize=(12, 8))
# background data cloud
ax.scatter(df.x, df.y, c='grey', alpha=0.5, s=30)

scatter = None

for thing in ['thing_1', 'thing_2']:

    if scatter is not None:
        scatter.remove()

    # subset to highlight
    ind = df[thing] == 1
    scatter = ax.scatter(df.loc[ind, 'x'], df.loc[ind, 'y'], c='red', 
    alpha=1, s=15)

    plt.savefig('{}_filter.png'.format(thing))

edited Jul 20, 2018 at 13:18

ak_slick

1,0167 silver badges21 bronze badges

answered Jul 19, 2018 at 22:49

ImportanceOfBeingErnest

342k61 gold badges739 silver badges771 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ak_slick Over a year ago

I got some errors when running your code. I have suggested some edits that seem to make it work as desired but I am not sure if it is good practice. Please look it over and let me know!

ak_slick Over a year ago

First off scatter = [] with the empty list would enter the scatter is not None. Second after assigning scatter[0] I would get an error scatter[0] index out of range with the scatter[0].remove() method.

ImportanceOfBeingErnest Over a year ago

Ok, looks good. It should have been if not scatter, but the solution you provided works well, so keep it if you like.

ak_slick Over a year ago

Thanks and thank you for your help! Much appreciated and will literally save me hours of plotting time.

Collectives™ on Stack Overflow

Reuse Base Plot Without Replotting

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related