9

I use matplotlib for a signal processing application and I noticed that it chokes on large data sets. This is something that I really need to improve to make it a usable application.

What I'm looking for is a way to let matplotlib decimate my data. Is there a setting, property or other simple way to enable that? Any suggestion of how to implement this are welcome.

Some code:

import numpy as np
import matplotlib.pyplot as plt

n=100000 # more then 100000 points makes it unusable slow
plt.plot(np.random.random_sample(n))
plt.show()

Some background information

I used to work on a large C++ application where we needed to plot large datasets and to solve this problem we used to take advantage of the structure of the data as follows:

In most cases, if we want a line plot then the data is ordered and often even equidistantial. If it is equidistantial, then you can calculate the start and end index in the data array directly from the zoom rectangle and the inverse axis transformation. If it is ordered but not equidistantial a binary search can be used.

Next the zoomed slice is decimated, and because the data is ordered we can simply iterate a block of points that fall inside one pixel. And for each block the mean, maximum and minimum is calculated. Instead of one pixel, we then draw a bar in the plot.

For example: if the x axis is ordered, a vertical line will be drawn for each block, possibly the mean with a different color.

To avoid aliasing the plot is oversampled with a factor of two.

In case it is a scatter plot, the data can be made ordered by sorting, because the sequence of plotting is not important.

The nice thing of this simple recipe is that the more you zoom in the faster it becomes. In my experience, as long as the data fits in memory the plots stays very responsive. For instance, 20 plots of timehistory data with 10 million points should be no problem.

2
  • Could you implement such a decimation algorithm outside of matplotlib rendering, just updating the data to be displayed upon zooming event? Commented Dec 12, 2013 at 14:54
  • Possibly of interest here: How can I subsample an array according to its density? Commented Dec 14, 2018 at 11:19

2 Answers 2

1

It seems like you just need to decimate the data before you plot it

import numpy as np
import matplotlib.pyplot as plt

n=100000 # more then 100000 points makes it unusable slow
X=np.random.random_sample(n)
i=10*array(range(n/10))
plt.plot(X[i])
plt.show()
Sign up to request clarification or add additional context in comments.

Comments

0

Decimation is not best for example if you decimate sparse data it might all appear as zeros.

The decimation has to be smart such that each LCD horizontal pixel is plotted with the min and the max of the data between decimation points. Then as you zoom in you see more an more detail.

With zooming this can not be done easy outside matplotlib and thus is better to handle internally.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.