1

EDIT: I found the issue. Some lines were missing, which led to these unwanted segments on the graph. I was able to "erase" these segments by filling missing dates with NaN.

I have a date column and other columns with some NaN values. I'd like to only have a plot for non-nan values, but there are segments between 2 non-nan values values (see image).

plt.figure()
plt.plot(merge_nal_cont["Date"], merge_nal_cont["GPP_DT_uStar"], label="daytime")
plt.plot(merge_nal_cont["Date"], merge_nal_cont["GPP_uStar_f"], label="nighttime")
plt.grid()
plt.legend()
plt.title("GPP à Nalohou")
plt.show()

segment where NaN values

1
  • 2
    Are you sure there are NaN values for the missing dates? Isn't it more like, there are no values for a range of dates in-between? This is not the same thing. Because if I create some sample data with evenly distributed dates and NaN values for some entries in "GPP_DT_uStar" and "GPP_uStar_f", matplotlib actually leaves these gaps empty rather than connecting their boundaries. In other words: cannot reproduce. Commented Mar 12 at 13:51

2 Answers 2

1

I guess the question's explanation is incomplete or phrased in a bit of a misleading way, because what I experience when trying to reproduce the problem is different from what is described and shown in the question.

Let's create some sample data, with a gap of 10 days represented by NaN values in the columns "GPP_DT_uStar" and "GPP_uStar_f", while the values in "Date" are consecutive, with a difference of 1 day for each successive row:

dates = pd.date_range(start="2025-01-01", end="2025-02-28", freq="D")
day = np.random.randint(0, 20, size=len(dates)).astype(float)
night = np.random.randint(0, 20, size=len(dates)).astype(float)
day[20:30] = night[20:30] = np.nan

df_with_nans = pd.DataFrame({"Date": dates, "GPP_DT_uStar": day, "GPP_uStar_f": night})

Thus, df_with_nans will look as follows:

         Date  GPP_DT_uStar  GPP_uStar_f
...  # More lines with actual numbers
18 2025-01-19           X.0          X.0
19 2025-01-20           X.0          X.0
20 2025-01-21           NaN          NaN
...  # 8 more lines with NaNs
29 2025-01-30           NaN          NaN
30 2025-01-31           X.0          X.0
31 2025-02-01           X.0          X.0
...  # More lines with actual numbers

Let's also create a version of this dataframe with an actual gap, i.e. rows with NaNs removed:

df_with_gaps = df_with_nans.dropna()

Thus, df_with_gaps will look as follows (note the missing rows between 2025-01-20 and 2025-01-31):

         Date  GPP_DT_uStar  GPP_uStar_f
...  # More lines with actual numbers
18 2025-01-19           X.0          X.0
19 2025-01-20           X.0          X.0
30 2025-01-31           X.0          X.0
31 2025-02-01           X.0          X.0
...  # More lines with actual numbers

If we plot these dataframes (see the plot at the bottom of the answer), we will find that

  • df_with_nans has gaps in the lines where we would expect them (namely, at the dates with NaN values);
  • df_with_gaps doesn't have these gaps but connects consecutive dates, whether they are 1 day or more than 1 day apart.

So, in other words, it is actually the opposite from what is written in the question: the version containing NaNs plots correctly, while the other version doesn't. This shouldn't be all too surprising: In the case where no NaNs are given, how should matplotlib know which dates are considered consecutive or not?

We can fix this by (re-)introducing discontinuities in df_with_gaps ourselves: At dates that are more than a given difference apart, we can introduce all-NaN rows in our dataframe and trigger corresponding gaps in the lines (like we saw with df_with_nans):

def df_with_nans_at_gaps(df, diff_threshold):
    df = df.copy().reset_index(drop=True)
    gaps = df[df["Date"].diff() > diff_threshold].index
    # Insert NaNs as rows where gaps have been detected
    for gap in gaps:
        df.loc[gap - 0.5] = np.nan
    return df.sort_index().reset_index(drop=True)

All in all, this could look as follows, assuming a gap should be present when consecutive dates are more than 1 day apart:

from datetime import timedelta
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Create some sample data
dates = pd.date_range(start="2025-01-01", end="2025-02-28", freq="D")
day = np.random.randint(0, 20, size=len(dates)).astype(float)
night = np.random.randint(0, 20, size=len(dates)).astype(float)
day[20:30] = night[20:30] = np.nan

# "df_with_nans": has consecutive dates (1 day difference) with NaNs for some day/night entries
df_with_nans = pd.DataFrame({"Date": dates, "GPP_DT_uStar": day, "GPP_uStar_f": night})
# "df_with_gaps": has non-consecutive dates (gaps with 1+ day difference)
df_with_gaps = df_with_nans.dropna()

def df_with_nans_at_gaps(df, diff_threshold):
    df = df.copy().reset_index(drop=True)
    gaps = df[df["Date"].diff() > diff_threshold].index
    # Insert NaNs as rows where gaps have been detected
    for gap in gaps:
        df.loc[gap - 0.5] = np.nan
    return df.sort_index().reset_index(drop=True)

# "proposed": has non-consecutive dates with NaNs at discontinuities
proposed = df_with_nans_at_gaps(df_with_gaps, diff_threshold=timedelta(days=1))

plt.figure(figsize=(6.4 * 3, 4.8))    
plt.subplot(131)  # Plot dataframe with NaN values → gaps are not connected
plt.plot(df_with_nans["Date"], df_with_nans["GPP_DT_uStar"], label="daytime")
plt.plot(df_with_nans["Date"], df_with_nans["GPP_uStar_f"], label="nighttime")
plt.grid(); plt.xticks(rotation=90); plt.legend()
plt.title("with NaNs")
plt.subplot(132)  # Plot dataframe with missing rows → gaps are connected
plt.plot(df_with_gaps["Date"], df_with_gaps["GPP_DT_uStar"], label="daytime")
plt.plot(df_with_gaps["Date"], df_with_gaps["GPP_uStar_f"], label="nighttime")
plt.grid(); plt.xticks(rotation=90); plt.legend()
plt.title("with gaps")
plt.subplot(133)  # Plot dataframe with missing rows and newly introduced gaps
plt.plot(proposed["Date"], proposed["GPP_DT_uStar"], label="daytime")
plt.plot(proposed["Date"], proposed["GPP_uStar_f"], label="nighttime")
plt.grid(); plt.xticks(rotation=90); plt.legend()
plt.title("with gaps, fixed")
plt.show()

The resulting dataframe proposed will look as follows:

         Date  GPP_DT_uStar  GPP_uStar_f
...  # More lines with actual numbers
18 2025-01-19           X.0          X.0
19 2025-01-20           X.0          X.0
20        NaT           NaN          NaN
21 2025-01-31           X.0          X.0
22 2025-02-01           X.0          X.0
...  # More lines with actual numbers

The resulting plot will look as follows, where "with NaNs" is the result of plotting df_with_nans, "with gaps" is the result of plotting df_with_gaps, and "with gaps, fixed" is the result of plotting proposed: plot with all 3 versions of the dataframe

Note that the line that I use for enforcing discontinuities, df.loc[gap - 0.5] = np.nan, produces a deprecation warning with the most recent versions of Pandas. I am not sure what is the recommended way of adding all-Nan rows to dataframes, I have to admit, but maybe someone else can help out here.

Sign up to request clarification or add additional context in comments.

Comments

-1

You can delete the Nan values before plotting with drop.na in pandas. For example :

clean_data = merge_nal_cont(subset = ["Date","GPP_DT_uStar","GPP_uStar_f"]

and then continue with the plot. You to perfom a data inspection so you can know which columns and rows have NaN values.

2 Comments

The line of code that you shared produces a syntax error. Did you accidentally cut it off too early?
I am sorry, by accident i forgot to end the command with a parentheses ")"

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.