0

I replicated a Pandas series with the following code:

data = np.array([1, 2, 3, 4, 5, np.nan, np.nan, np.nan, 9,10,11,12,13,14])
  
ser = pd.Series(data)
print(ser)

I would like to select only the columns before the NaN values so that I only get the values 1,2,3,4,5. How should I do that?

2 Answers 2

1

Test missing values with Series.isna and add Series.cummax for repeat Trues after first match and last invert mask by ~, filter in boolean indexing:

a = ser[~ser.isna().cummax()]
print(a)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

Alternative solution with cumulative sum:

a = ser[ser.isna().cumsum().eq(0)]
print(a)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

Details:

print(ser.to_frame().assign(testna = ser.isna(), 
                            cummax = ser.isna().cumsum(),
                            invert = ser.isna().cumsum().eq(0)))
       0  testna  cummax  invert
0    1.0   False       0    True
1    2.0   False       0    True
2    3.0   False       0    True
3    4.0   False       0    True
4    5.0   False       0    True
5    NaN    True       1   False
6    NaN    True       2   False
7    NaN    True       3   False
8    9.0   False       3   False
9   10.0   False       3   False
10  11.0   False       3   False
11  12.0   False       3   False
12  13.0   False       3   False
13  14.0   False       3   False

print(ser.to_frame().assign(testna = ser.isna(), 
                            cummax = ser.isna().cummax(),
                            test0 = ~ser.isna().cummax()))

       0  testna  cummax   test0
0    1.0   False   False    True
1    2.0   False   False    True
2    3.0   False   False    True
3    4.0   False   False    True
4    5.0   False   False    True
5    NaN    True    True   False
6    NaN    True    True   False
7    NaN    True    True   False
8    9.0   False    True   False
9   10.0   False    True   False
10  11.0   False    True   False
11  12.0   False    True   False
12  13.0   False    True   False
13  14.0   False    True   False
Sign up to request clarification or add additional context in comments.

Comments

1

Use a boolean mask to slice the series.

You have two options, check is the value is not NA with notna and extend the False values after the first True with Series.cummin.

ser[ser.notna().cummin()]

output:

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

Or, test if the values are NA with isna, then extend the True values after the first True with Series.cummax, then invert the mask with ~:

ser[~ser.isna().cummax()]
visual representation of how it works:
    data  notna  notna+cummin   isna  isna+cummax  ~(isna+cummax)
0    1.0   True          True  False        False            True
1    2.0   True          True  False        False            True
2    3.0   True          True  False        False            True
3    4.0   True          True  False        False            True
4    5.0   True          True  False        False            True
5    NaN  False         False   True         True           False
6    NaN  False         False   True         True           False
7    NaN  False         False   True         True           False
8    9.0   True         False  False         True           False
9   10.0   True         False  False         True           False
10  11.0   True         False  False         True           False
11  12.0   True         False  False         True           False
12  13.0   True         False  False         True           False
13  14.0   True         False  False         True           False

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.