1

i have a very large dataframe, i did a for loop but it is taking forever, and I am wondering if there is any alternative?

index ids year
0 1890 2001
1 2678 NaN
2 4780 NaN
3 9844 1999

the idea is to get an array of ids of people who have NaN values in the 'year' column, so what I did, was I turned NaN into 0, and wrote this for loop.

df_nan = []
for i in range(0, len(df.index)):
    for j in range(0, len(df.columns)):
        if ((int(df.values[i,j])) == 0):
            df_nan.append(df.values[i,0])

the for loop works, coz I tried it on a smaller dataframe, but I cant use it on the main one because it takes so long.

1 Answer 1

1

You can use filtering.

df = pd.DataFrame({'ids': [1890, 2678, 4780, 9844], 'year': [2001, pd.np.nan, pd.np.nan, 1999]})
nan_rows = df[df['year'].isnull()]
ids = nan_rows['ids'].values
print(ids) # outputs: [2678 4780]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.