Tooltips using mpldatacursor in matplotlib

Question

I have been pulling my hair out for a while over this. I am trying to use mpldatacursor along with matplotlib to provide a tooltip functionality on scatter plots. Each point has some data associated with it which I would like to show when the point is clicked.

Here is a minimal (not) working example:

import numpy as np
import mpldatacursor
import string
import matplotlib
matplotlib.use('Qt5Agg')
from matplotlib import pyplot as mpl

nations = ['Russia', 'America', 'China', 'France']
data = list()
idx = list()

np.random.seed(42) #Seed for repeatability

# Random data
for (id, nation) in enumerate(nations):
    for i in range(0,10):
        data.append((id+1)*np.random.random((2,1)))
        name = list(string.ascii_uppercase[20:])
        np.random.shuffle(name)
        idx.append(nation + '-' + ''.join(name))

mpl.figure()
data = np.squeeze(np.asarray(data))
m, n = 0, 9

# Plot by group
for (id,nation) in enumerate(nations):
    mpl.scatter(data[m:n,0] , data[m:n,1] , label=nation)
    m = n + 1
    n += 10

formatter = lambda **kwargs: ', '.join(kwargs['point_label'])
mpl.legend()
mpldatacursor.datacursor(formatter=formatter, point_labels=idx)
mpl.show(block=True)

But when I do this, the tooltips don't match the legends. Further only labels starting with Russia and USA show up in the plot. What am I doing wrong?

ImportanceOfBeingErnest · Accepted Answer · 2018-06-13 18:16:06Z

Usually you would have your data in a table or, for the sake of the example, several lists. One would hence probably create a single scatter plot from the data columns and use a mapping of names to numbers to create the colors in the scatter.

Then one can use the matplotlib pick_event to get the data out of the respective list, given the index of the point on which the click happened.

This all does not require any external packages like datacursor.

import numpy as np; np.random.seed(42)
import string
from matplotlib import pyplot as plt

nations = ['Russia', 'America', 'China', 'France']

#Create lists data, nat, idx
nat = np.random.choice(nations, 50)
data = np.random.rand(50,2)
strings = ["".join(np.random.choice(list(string.ascii_uppercase), 7)) for _ in range(50)]
idx = ["{}-{}".format(n,w) for n,w in zip(nat,strings)]

labels, i = np.unique(nat, return_inverse=True)

fig, ax = plt.subplots()


scatter = ax.scatter(data[:,0], data[:,1], c=i, cmap="RdYlGn", picker=5)

rect = lambda c: plt.Rectangle((0,0),1,1, color=scatter.cmap(scatter.norm(c)))
handles = [rect(c) for c in np.unique(i)]
plt.legend(handles, labels)

#Create annotation
annot = ax.annotate("", xy=(0,0), xytext=(-20,20),textcoords="offset points",
                    bbox=dict(boxstyle="round", fc="w"),
                    arrowprops=dict(arrowstyle="->"))
annot.set_visible(False)

#Create event handler
def onpick(evt):
    if evt.artist == scatter:
        ind = evt.ind[0]
        annot.xy = (data[ind])
        annot.set_text(idx[ind])
        annot.set_visible(True)
    if evt.mouseevent.button == 3:
        annot.set_visible(False)
    fig.canvas.draw_idle()

fig.canvas.mpl_connect("pick_event", onpick)

plt.show()

This does answer the question completely, but in my case scatter() is just one of many methods that a class I am implementing and its just easier to use mpldatacursor since it packages many features into a module.

ITA · Accepted Answer · 2018-06-13 16:56:59Z

The issue was that each call to scatter by matplotlib was creating a new artist object. The workaround is based on the doc-string in the source code.

point_labels : sequence or dict, optional Labels for "subitems" of an artist, passed to the formatter function as the point_label kwarg. May be either a single sequence (used for all artists) or a dict of artist:sequence pairs.

It does involve the import of a protected matplotlib module/member. This seems to work as I want:

import numpy as np
import mpldatacursor
import string
import matplotlib
from matplotlib import _pylab_helpers as pylab_helpers
matplotlib.use('Qt5Agg')
from matplotlib import pyplot as mpl

nations = ['Russia', 'America', 'China', 'France']
data = list()
idx = list()

np.random.seed(42)

for (index, nation) in enumerate(nations):
    for i in range(0,10):
        data.append((index + 1) * np.random.random((2, 1)))
        name = list(string.ascii_uppercase[20:])
        np.random.shuffle(name)
        idx.append(nation + '-' + ''.join(name))

data = np.squeeze(np.asarray(data))
m, n = 0, 9
artist_labels = list()
mpl.figure()

for (index, nation) in enumerate(nations):
    mpl.scatter(data[m:n,0] , data[m:n,1] ,label=nation)
    artist_labels.append(idx[m:n])
    m = n + 1
    n += 10

def plotted_artists(ax):
    all_artists = (ax.lines + ax.patches + ax.collections
               + ax.images + ax.containers)
    return all_artists

def formatter (**kwargs):
    return kwargs['point_label'].pop()

managers = pylab_helpers.Gcf.get_all_fig_managers()
figs = [manager.canvas.figure for manager in managers]
axes = [ax for fig in figs for ax in fig.axes]
artists = [artist for ax in axes for artist in plotted_artists(ax)]

my_dict = dict(zip(artists, artist_labels))
mpldatacursor.datacursor(formatter=formatter, point_labels=my_dict)

mpl.legend()
mpl.show(block=True)

Ed Smith · Accepted Answer · 2018-06-13 16:42:13Z

1

Assuming you simply want names, this seems to work correctly if you change the mpldatacursor.datacursor call to use '{label}' as in the first example on the mpldatacursor website,

mpldatacursor.datacursor(formatter='{label}'.format)

I think the problem is with kwargs and the lambda function. If you want further data in your tooltip, it may be best to add this to the label on plt.scatter, using a separate call for each point, e.g.

import numpy as np
import mpldatacursor
import string
import matplotlib
matplotlib.use('Qt5Agg')
from matplotlib import pyplot as plt

nations = ['Russia', 'America', 'China', 'France']
cDict =  {'Russia':'r', 'America':'b', 'China':'g', 'France':'c'}

np.random.seed(42) #Seed for repeatability

# Random data
for (id, nation) in enumerate(nations):
    for i in range(0,10):
        x = (id+1)*np.random.random((2,1))
        name = list(string.ascii_uppercase[20:])
        np.random.shuffle(name)
        plt.scatter(x[0], x[1], c=cDict[nation], label=nation + '-' + ''.join(name))

mpldatacursor.datacursor(formatter='{label}'.format)
plt.show(block=True)

edited Jun 13, 2018 at 16:42

answered Jun 13, 2018 at 16:24

Ed Smith

13.3k2 gold badges48 silver badges58 bronze badges

5 Comments

ITA Over a year ago

No I really do want the whole string in idx , not just the country name, i.e I want to be able to distinguish points one step more than what can be simply done with legends.

Ed Smith Over a year ago

I see... I'm no expert with mpldatacursor but as matplotlib scatter creates a collection, with a single label (for use in legends), you'd need to do a separate plot per point (added an example above) so each has its own label (or keep a separate list in the same order as your plotted data, which strikes me as potentially problematic).

ITA Over a year ago

The problem is each call to scatter creates a different matplotlib artist.

ITA Over a year ago

Your answer is fine, except you can't use a call to legend anymore. There is workaround that lets you do both. I posted it as an answer.

Ed Smith Over a year ago

Yeah, not ideal for efficiency but I can't see another solution. You could create a range of patch objects which can each have a label and add them to a collection manually.

Collectives™ on Stack Overflow

Tooltips using mpldatacursor in matplotlib

3 Answers 3

1 Comment

Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related