9

I want to draw a scatter plot using pylab, however, some of my data are NaN, like this:

a = [1, 2, 3]
b = [1, 2, None]

pylab.scatter(a,b) doesn't work.

Is there some way that I could draw the points of real value while not displaying these NaN value?

1

2 Answers 2

18

Things will work perfectly if you use NaNs. None is not the same thing. A NaN is a float.

As an example:

import numpy as np
import matplotlib.pyplot as plt

plt.scatter([1, 2, 3], [1, 2, np.nan])
plt.show()

enter image description here

Have a look at pandas or numpy masked arrays (and numpy.genfromtxt to load your data) if you want to handle missing data. Masked arrays are built into numpy, but pandas is an extremely useful library, and has very nice missing value functionality.

As an example:

import matplotlib.pyplot as plt
import pandas

x = pandas.Series([1, 2, 3])
y = pandas.Series([1, 2, None])
plt.scatter(x, y)
plt.show()

pandas uses NaNs to represent masked data, while masked arrays use a separate mask array. This means that masked arrays can potentially preserve the original data, while temporarily flagging it as "missing" or "bad". However, they use more memory, and have a hidden gotchas that can be avoided by using NaNs to represent missing data.

As another example, using both masked arrays and NaNs, this time with a line plot:

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 6 * np.pi, 300)
y = np.cos(x)

y1 = np.ma.masked_where(y > 0.7, y)

y2 = y.copy()
y2[y > 0.7] = np.nan

fig, axes = plt.subplots(nrows=3, sharex=True, sharey=True)
for ax, ydata in zip(axes, [y, y1, y2]):
    ax.plot(x, ydata)
    ax.axhline(0.7, color='red')

axes[0].set_title('Original')
axes[1].set_title('Masked Arrays')
axes[2].set_title("Using NaN's")

fig.tight_layout()

plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Things will not work perfectly if you use NaNs and semilogy... the plot will look fine, but it throws up this warning: RuntimeWarning: invalid value encountered in less_equal mask = a <= 0.0
Notice that the default behavior is to contract the xlims if there are NaNs at either end of the plot range, as also observable in the first plot here. In the second plot, this only doesn't happen because of the sharex=True. If you have a single plot but still want to preserve the whole original x range, use plt.xlim([start, end]) or ax.set_xlim([start, end]).
1

Because you are drawing in 2D space, your points need to be defined by both an X and an Y value. If one of the values is None, that point cannot exist in 2D space so it cannot be plotted, hence you should remove both the None and it's corresponding value from the other list.

There are many ways to accomplish this. Here is one:

a = [1, 2, 3]
b = [1, None, 2]

i = 0
while i < len(a):
    if a[i] == None or b[i] == None:
        a = a[:i] + a[i+1:]
        b = b[:i] + b[i+1:]
    else:
        i += 1

"""Now a = [1, 3] and b = [1, 2]"""

pylab.scatter(a,b)

1 Comment

Be careful with if not a[i].... If either array has zeros, you'll remove them. Zero is a perfectly valid value!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.