1

I am trying to select a range of data in a matplotlib plot and got stuck with the return values of SpanSelector:

from io import StringIO
import matplotlib.pyplot as plt
from matplotlib.widgets import SpanSelector
import numpy as np
import pandas as pd

# create data
csv_file = StringIO("""URMS[V];IRMS[A];P[W];FPLL[Hz];URange[V];IRange[A];S[VA];Q[var];LAMBDA[];UTHD[%];Timestamp
234.63;0.1802;0.0002E+03;49.995;300;5;0.0423E+03;0.0423E+03;0.004;1.20;09:01:16.000
234.56;0.1803;0.0003E+03;49.996;300;5;0.0423E+03;0.0423E+03;0.004;1.15;09:01:16.100
234.70;0.1807;0.0002E+03;49.997;300;5;0.0424E+03;0.0424E+03;0.004;1.15;09:01:16.200
234.50;0.1807;0.0002E+03;49.998;300;5;0.0424E+03;0.0424E+03;0.004;1.18;09:01:16.300
234.84;0.1805;0.0001E+03;49.998;300;5;0.0424E+03;0.0424E+03;0.004;1.18;09:01:16.400
234.57;0.1796;0.0003E+03;49.999;300;5;0.0421E+03;0.0421E+03;0.004;1.20;09:01:16.500
234.67;0.1809;0.0002E+03;49.999;300;5;0.0424E+03;0.0424E+03;0.004;1.25;09:01:16.600""")

# read CSV file
data = pd.read_csv(csv_file, delimiter=';')

# convert timestamp to datetime objekt
data['Timestamp'] = pd.to_datetime(data['Timestamp'])

# create plot
fig, ax = plt.subplots()
ax.plot(data['Timestamp'], data['P[W]'], label='Leistung')

def onselect(xmin, xmax):
    if data is not None:
        sel_start = pd.to_datetime(xmin)
        sel_end = pd.to_datetime(xmax)
        # filter data from selected range
        mask = (data['Timestamp'] >= sel_start) & (data['Timestamp'] <= sel_end)
        selected_subset = data.loc[mask]
        # calculate mean
        mean_value = selected_subset['P[W]'].mean()
        print(f"Mittelwert im selektierten Bereich: {mean_value:.2f} P[W]")

plt.xlabel('Zeit')
plt.ylabel('P[W]')
plt.title('Zeitlicher Verlauf Wirkleistung')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
span = SpanSelector(ax, onselect, 'horizontal', useblit=True, rectprops=dict(alpha=0.5, facecolor='red'), span_stays=True)

plt.show()

Converting SpanSelectors xmin and xmax to pd datetime objects (line 29, 30 in the above example) does not work: Both sel_start and sel_end end up as same values. Clearly a sign that I am doing something totally wrong...

Any hint on how to circumvent the problem is gladly accepted.

And for what it's worth: python==3.9.2, matplotlib==3.3.4, and pandas==1.2.3

1 Answer 1

0

Interesting problem you did encounter. It seems matplotlib internally uses a different number format for datetime, in which your xmin and xmax seemed to be the same, as the difference was super small.

I was able to fix your problem by adding a few lines. The dates module of matplotlib provides the method num2date to get something pandas can work with.

import matplotlib.dates as mdates
(...)

        sel_start = pd.to_datetime(mdates.num2date(xmin))
        sel_end = pd.to_datetime(mdates.num2date(xmax))

However, that method returns a timezone-aware datetime. That is why I had to add utc=True in the following line.

# convert timestamp to datetime objekt
data['Timestamp'] = pd.to_datetime(data['Timestamp'], utc=True)

I tested it and with these modifications onselect returned reasonable results.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for this solution, I am deeply impressed!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.