0

I have several subplots that I've created using matplotlib. Once I plot the data, I need to go back and draw lines between data points in a for loop. My data file is large and this takes python a very long time...

Is there a way to speed this up? Here is my code:

def generateHistogram(x, y, ax):
    x = np.log10(x)
    xerror = []

    numData = len(x)

    plt.rcParams['lines.solid_capstyle'] = 'butt'

    for dataIndex in range(0, numData-1):
        xerror.append(np.divide(np.subtract(x[dataIndex+1], x[dataIndex]), 2.0))

    for errorIndex in range(0, len(x)):
        if (errorIndex == 0):
            ax.semilogx((np.power(10, (x[errorIndex]-xerror[errorIndex])), np.power(10, x[errorIndex])),
                        (y[errorIndex], y[errorIndex]), linewidth=2, color='k')

        if (errorIndex == len(xerror)):
            ax.semilogx((np.power(10, x[errorIndex]), np.power(10, (x[errorIndex]+xerror[errorIndex-1]))),
                        (y[errorIndex], y[errorIndex]), linewidth=2, color='k')

        if (errorIndex < len(xerror)):
            ax.semilogx((np.power(10, x[errorIndex]), np.power(10, (x[errorIndex]+xerror[errorIndex]))),
                         (y[errorIndex], y[errorIndex]), linewidth=2, color='k')
            ax.semilogx((np.power(10, (x[errorIndex+1]-xerror[errorIndex])), np.power(10, x[errorIndex+1])),
                        (y[errorIndex+1], y[errorIndex+1]), linewidth=2, color='k')

            verticleLineXPos = np.power(10, (x[errorIndex]+xerror[errorIndex]))
            ax.semilogx((verticleLineXPos, verticleLineXPos), (y[errorIndex], y[errorIndex+1]),
                        linewidth=2, color='k')


    return xerror;

This basically draws line on each of the subplots (where the x-axis is in a semilogx scale) at the positions I need. Do you have any suggestions for improving the performance?

4
  • Can you provide a minimal example of what sort of data structure x and y are in this case? If these are 1D arrays, your first loop is xerror = np.diff (x) / 2. Can you post a picture of what you are trying to do? You might also want to look into a LineCollection artist. Commented Jan 24, 2016 at 0:14
  • The x, y, and xerror data structures are each just a list of floats. The loops are used to calculate the length and position of the lines I need to draw. The details of this are not important to the question. I am asking about how to efficiently plot multiple lines on a plot using matplotlib. Does LineCollection actually give an increase in performance over plotting each one? Is there a way to get better speed ups? Commented Jan 24, 2016 at 7:17
  • this might be better placed over on codereview.stackexchange.com Commented Jan 24, 2016 at 8:16
  • Right, but I suspect much of the problem is that you have a loop at all. I think what you need here is ax.vlines matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.vlines Commented Jan 24, 2016 at 19:10

2 Answers 2

2

I found this reference here, which provides a clever way to optimize this: http://exnumerus.blogspot.com/2011/02/how-to-quickly-plot-multiple-line.html

It provides a 99% speed-up in performance. It worked nicely for me.

Sign up to request clarification or add additional context in comments.

1 Comment

Seperating line segments by None works greate. This tip should be written in the document of matplotlib.
0

When you need to pass in different kwargs to different lines

grover's Answer is ideal for when you don't need to pass any additional kwargs to individual lines.

However, if you do (as I did) then you can't use the one-call approach outlined in his link.

After examining where plot takes it's time, particularly when plotting many lines on the same axis using a loop, it turns out that much time is wasted on autoscaling the view after each internal Line2D object is added.

To get around this, we can just reuse the internals of the plot function (see here on the github repo), but only calling the autoscale_view function once, when we're done.

This provides a speed up of ~x2, which while not anywhere near to the speed up afforded by passing all plot arguments at once, is still useful for situations where you need pass in different kwargs (such as specific line colors).

Example code below.

import numpy as np
import matplotlib.pyplot as plt

Nlines = 1000
Npoints = 100

fig = plt.figure()
ax = fig.add_subplot(111)
xdata = (np.random.rand(Npoints) for N in range(Nlines))
ydata = (np.random.rand(Npoints) for N in range(Nlines))
colorvals = (np.random.rand() for N in range(Nlines))

for x, y, col in zip(xdata, ydata, colorvals):
    ax.add_line(
        plt.Line2D(
            x,y,
            color=plt.cm.jet(col)
        )
    )
ax.autoscale_view()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.