0

I am editing my previous question as it was flawed. I have a data frame named df. In that data frame, columns contain values, some of them are negative values, zeros, and NaN. I want to replace these values and store a respective value of the flag in another data frame at the respective index.

df = pd.read_excel('Check.xlsx')
df_ph_temp = df.iloc[:,2:5]
df_flags = pd.DataFrame(index=df.index, columns=df.columns)
flag_ph_temp = df_flags.iloc[:,2:5]
for rowIndex, row in df_ph_temp.iterrows() :
    for colIndex, value in row.items() :
        if value == 0 :
            df_ph_temp.loc[rowIndex, colIndex] = df_ph_temp.loc[rowIndex - 1, colIndex]
            flag_ph_temp.loc[rowIndex, colIndex] = 1            
        elif value < 0 :
            df_ph_temp.loc[rowIndex, colIndex] = 0
            flag_ph_temp.loc[rowIndex, colIndex] = 1
        elif value > 200 :
            df_ph_temp.loc[rowIndex, colIndex] = 130
            flag_ph_temp.loc[rowIndex, colIndex] = 2
        elif value == np.nan : # Not working... Why?
            df_ph_temp.loc[rowIndex, colIndex] = df_ph_temp.loc[rowIndex - 1, colIndex]
            flag_ph_temp.loc[rowIndex, colIndex] = 1            
        else :
            continue

I am not getting any errors but also not getting desired output. Replacing NaN values and storing the resp. flag values in the flag's data frame, this part of the program is not working. I think this is because data contains more than 2 lines with NaN values. Is there a way to fix this? I tried

df_ph_temp[colIndex].fillna(method ='ffill', inplace = True)

before the if condition but still not able to achieve desired results.

I am unable to figure it out. Kindly help.

1 Answer 1

2

Using pandas, you should avoid loop. Use mask filtering and slicing to fill your flag column. In order to detect null values, use .isnull() directly on pandas dataframe or series (when you select a column), not on a value as you did. Then use .fillna() if you want to replace null values with something else.

Based on your code (but not sure it will works, it could be helpfull you share some input data and expected output), the solution may look as follow.

First create empty column as you did:

data['Flags'] = None

Then fill this columns based on condition on "Temperature phase" column (using fillna(0) to replace all null values by 0 allow you to only test if values are <= 0, this replacement is not applied on the final dataframe):

data.loc[data['Temperature phase'].fillna(0) <= 0, "Flags"] = 1
data.loc[data['Temperature phase'] > 200, "Flags"] = 2

And now replace Temperature phase values.

For the values equal to 0 or null, you seems to have choosen to replace them with the previous value in dataframe. You maybe could achieve this part using this.

data.loc[data['Temperature phase'].isnull(), 'Temperature phase'] = data['Temperature phase'].shift().loc[data.loc[data['Temperature phase'].isnull()].index]

First, this command use .shift() to shift all values in column Temperature phase by one, then filtering rows where Temperature phase is null and replace values by corresponding index in shifted Temperature phase values.

Finaly, replace other Temperature phase values:

data.loc[data['Temperature phase'] < 0, "Temperature phase"] = 0
data.loc[data['Temperature phase'] > 200, "Temperature phase"] = 130

You don't need flag index so on as the Flag is directly fill in the final dataframe.

Sign up to request clarification or add additional context in comments.

7 Comments

Thank you for your response. I will check this.
You're welcome. Don't hesitate to validate the response if it fit your needs to close the question.
Your answer was useful however I have edited my question. Your answer helped me to understand mask filtering thank you once again.
You really need to avoid loop by using pandas masks filtering instead. Looping over a dataframe lead to very bad performances. However, your condition value == np.nan may not work because your are dealing with None and not np.nan with are to distinct objects. Use the pandas .isnull() to get None and NaN values.
Yes, I removed loops and was able to achieve the desired results. Just a follow-up question how can we combine multiple conditions to store the flag values in the resp column. I did following but getting an Value Error df_flags.loc[(df[col].fillna(0) > 0 and df[col].fillna(0) <= 10), col] = 0 df_flags.loc[(df[col].fillna(0) > 25 and df[col].fillna(0) <= 50), col] = 0.5 df_flags.loc[(df[col].fillna(0) > 50 and df[col].fillna(0) <= 150), col] = 1
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.