How to replace multiple NaN rows in pandas dataframe with a single NaN

Question

I have a time series, where one of the columns contains mostly NaN. The dataframe is quite large, so handling all these NaNs is becoming a burden. If I simply drop all of them, plots are getting completely messed up with interpolation between remaining data points.

Here's a short example of what I have:

        v           x
0.0000  0.000000    NaN
0.0002  0.062791    NaN
0.0004  0.125333    NaN
0.0006  0.187381    95.0
0.0008  0.248690    NaN
0.0010  0.309017    NaN
0.0012  0.368125    NaN
0.0014  0.425779    88.0
0.0016  0.481754    85.0
0.0018  0.535827    91.0
0.0020  0.587785    NaN
0.0022  0.637424    NaN
0.0024  0.684547    NaN
0.0026  0.728969    99.0
...

and what I want to achieve:

        v           x
0.0004  0.125333    NaN
0.0006  0.187381    95.0
0.0012  0.368125    NaN
0.0014  0.425779    88.0
0.0016  0.481754    85.0
0.0018  0.535827    91.0
0.0024  0.684547    NaN
0.0026  0.728969    99.0
...

Iterating through rows is not an option, as it will be way to slow, but I can't figure out any reasonable approach. Any ideas?

A dataset example (rather short) to work with:

import pandas as pd
import numpy as np
f=50
Tmax = 1
fs= 5000

df = pd.DataFrame(index=np.arange(0, Tmax, 1/fs), data={'x':np.random.randint(0,100, size=int(fs*Tmax))})
df['v'] = np.sin(2*np.pi*f*df.index)

# Most of "x" is NaN
df.loc[df['x']<75, 'x'] = np.NaN

jezrael · Accepted Answer · 2020-02-12 08:30:27Z

3

Use boolean indexing with compare by Series.shifted and Series.notna and chain by | for bitwise OR:

df = df[df.x.shift(-1).notna() | df.x.notna()]
print (df)
               v     x
0.0004  0.125333   NaN
0.0006  0.187381  95.0
0.0012  0.368125   NaN
0.0014  0.425779  88.0
0.0016  0.481754  85.0
0.0018  0.535827  91.0
0.0024  0.684547   NaN
0.0026  0.728969  99.0

edited Feb 12, 2020 at 8:30

answered Feb 12, 2020 at 8:20

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Asetti sri harsha · Accepted Answer · 2020-02-12 08:24:06Z

2

Create a new variable temp by using shift method on x.

df['temp'] = df.x.shift(-1)

Then filter the rows where either x or temp is not null.

df[(~df.x.isnull())|(~df.temp.isnull())]

Since this approach uses inbuilt functions & filtering rather than loops, it should be faster.

answered Feb 12, 2020 at 8:24

Asetti sri harsha

9997 silver badges11 bronze badges

1 Comment

yatu Over a year ago

You could chain the methods there to avoid creating a new col

Collectives™ on Stack Overflow

How to replace multiple NaN rows in pandas dataframe with a single NaN

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related