1

I have been pulling my hair out for a while over this. I am trying to use mpldatacursor along with matplotlib to provide a tooltip functionality on scatter plots. Each point has some data associated with it which I would like to show when the point is clicked.

Here is a minimal (not) working example:

import numpy as np
import mpldatacursor
import string
import matplotlib
matplotlib.use('Qt5Agg')
from matplotlib import pyplot as mpl

nations = ['Russia', 'America', 'China', 'France']
data = list()
idx = list()

np.random.seed(42) #Seed for repeatability

# Random data
for (id, nation) in enumerate(nations):
    for i in range(0,10):
        data.append((id+1)*np.random.random((2,1)))
        name = list(string.ascii_uppercase[20:])
        np.random.shuffle(name)
        idx.append(nation + '-' + ''.join(name))

mpl.figure()
data = np.squeeze(np.asarray(data))
m, n = 0, 9

# Plot by group
for (id,nation) in enumerate(nations):
    mpl.scatter(data[m:n,0] , data[m:n,1] , label=nation)
    m = n + 1
    n += 10

formatter = lambda **kwargs: ', '.join(kwargs['point_label'])
mpl.legend()
mpldatacursor.datacursor(formatter=formatter, point_labels=idx)
mpl.show(block=True)

But when I do this, the tooltips don't match the legends. Further only labels starting with Russia and USA show up in the plot. What am I doing wrong?

enter image description here

3 Answers 3

2

Usually you would have your data in a table or, for the sake of the example, several lists. One would hence probably create a single scatter plot from the data columns and use a mapping of names to numbers to create the colors in the scatter.

Then one can use the matplotlib pick_event to get the data out of the respective list, given the index of the point on which the click happened.

This all does not require any external packages like datacursor.

import numpy as np; np.random.seed(42)
import string
from matplotlib import pyplot as plt

nations = ['Russia', 'America', 'China', 'France']

#Create lists data, nat, idx
nat = np.random.choice(nations, 50)
data = np.random.rand(50,2)
strings = ["".join(np.random.choice(list(string.ascii_uppercase), 7)) for _ in range(50)]
idx = ["{}-{}".format(n,w) for n,w in zip(nat,strings)]

labels, i = np.unique(nat, return_inverse=True)

fig, ax = plt.subplots()


scatter = ax.scatter(data[:,0], data[:,1], c=i, cmap="RdYlGn", picker=5)

rect = lambda c: plt.Rectangle((0,0),1,1, color=scatter.cmap(scatter.norm(c)))
handles = [rect(c) for c in np.unique(i)]
plt.legend(handles, labels)

#Create annotation
annot = ax.annotate("", xy=(0,0), xytext=(-20,20),textcoords="offset points",
                    bbox=dict(boxstyle="round", fc="w"),
                    arrowprops=dict(arrowstyle="->"))
annot.set_visible(False)

#Create event handler
def onpick(evt):
    if evt.artist == scatter:
        ind = evt.ind[0]
        annot.xy = (data[ind])
        annot.set_text(idx[ind])
        annot.set_visible(True)
    if evt.mouseevent.button == 3:
        annot.set_visible(False)
    fig.canvas.draw_idle()

fig.canvas.mpl_connect("pick_event", onpick)

plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

This does answer the question completely, but in my case scatter() is just one of many methods that a class I am implementing and its just easier to use mpldatacursor since it packages many features into a module.
2

The issue was that each call to scatter by matplotlib was creating a new artist object. The workaround is based on the doc-string in the source code.

point_labels : sequence or dict, optional Labels for "subitems" of an artist, passed to the formatter function as the point_label kwarg. May be either a single sequence (used for all artists) or a dict of artist:sequence pairs.

It does involve the import of a protected matplotlib module/member. This seems to work as I want:


import numpy as np
import mpldatacursor
import string
import matplotlib
from matplotlib import _pylab_helpers as pylab_helpers
matplotlib.use('Qt5Agg')
from matplotlib import pyplot as mpl

nations = ['Russia', 'America', 'China', 'France']
data = list()
idx = list()

np.random.seed(42)

for (index, nation) in enumerate(nations):
    for i in range(0,10):
        data.append((index + 1) * np.random.random((2, 1)))
        name = list(string.ascii_uppercase[20:])
        np.random.shuffle(name)
        idx.append(nation + '-' + ''.join(name))

data = np.squeeze(np.asarray(data))
m, n = 0, 9
artist_labels = list()
mpl.figure()

for (index, nation) in enumerate(nations):
    mpl.scatter(data[m:n,0] , data[m:n,1] ,label=nation)
    artist_labels.append(idx[m:n])
    m = n + 1
    n += 10

def plotted_artists(ax):
    all_artists = (ax.lines + ax.patches + ax.collections
               + ax.images + ax.containers)
    return all_artists

def formatter (**kwargs):
    return kwargs['point_label'].pop()

managers = pylab_helpers.Gcf.get_all_fig_managers()
figs = [manager.canvas.figure for manager in managers]
axes = [ax for fig in figs for ax in fig.axes]
artists = [artist for ax in axes for artist in plotted_artists(ax)]

my_dict = dict(zip(artists, artist_labels))
mpldatacursor.datacursor(formatter=formatter, point_labels=my_dict)

mpl.legend()
mpl.show(block=True)

Comments

1

Assuming you simply want names, this seems to work correctly if you change the mpldatacursor.datacursor call to use '{label}' as in the first example on the mpldatacursor website,

mpldatacursor.datacursor(formatter='{label}'.format)

I think the problem is with kwargs and the lambda function. If you want further data in your tooltip, it may be best to add this to the label on plt.scatter, using a separate call for each point, e.g.

import numpy as np
import mpldatacursor
import string
import matplotlib
matplotlib.use('Qt5Agg')
from matplotlib import pyplot as plt

nations = ['Russia', 'America', 'China', 'France']
cDict =  {'Russia':'r', 'America':'b', 'China':'g', 'France':'c'}

np.random.seed(42) #Seed for repeatability

# Random data
for (id, nation) in enumerate(nations):
    for i in range(0,10):
        x = (id+1)*np.random.random((2,1))
        name = list(string.ascii_uppercase[20:])
        np.random.shuffle(name)
        plt.scatter(x[0], x[1], c=cDict[nation], label=nation + '-' + ''.join(name))

mpldatacursor.datacursor(formatter='{label}'.format)
plt.show(block=True)

5 Comments

No I really do want the whole string in idx , not just the country name, i.e I want to be able to distinguish points one step more than what can be simply done with legends.
I see... I'm no expert with mpldatacursor but as matplotlib scatter creates a collection, with a single label (for use in legends), you'd need to do a separate plot per point (added an example above) so each has its own label (or keep a separate list in the same order as your plotted data, which strikes me as potentially problematic).
The problem is each call to scatter creates a different matplotlib artist.
Your answer is fine, except you can't use a call to legend anymore. There is workaround that lets you do both. I posted it as an answer.
Yeah, not ideal for efficiency but I can't see another solution. You could create a range of patch objects which can each have a label and add them to a collection manually.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.