0

I have a dataframe that looks like:

df_test = pd.DataFrame({
    0: ['Property1', 1.2, 1.5, 2.6], 
    1: ["Std", 0.1,0.01,0.02],
    2: ["Rep", 3,3,3],
    3: ["Property2", 3.1,18.2,10.66],
    4: ["Property3", 22,33,44],
    5: ["Rep", 3,3,3],
    6: ["Property4", 6,12,14.23],
    7: ["Property5", 3.1,18.2,10.66],
    8: ["Std", 1,0.2,0.66],
})
         0     1    2          3          4    5          6          7     8
0  Property1   Std  Rep  Property2  Property3  Rep  Property4  Property5   Std
1        1.2   0.1    3        3.1         22    3          6        3.1     1
2        1.5  0.01    3       18.2         33    3         12       18.2   0.2
3        2.6  0.02    3      10.66         44    3      14.23      10.66  0.66
I want to insert columns if the value in df[1,i] is not ["Std", "Rep"] and df[1,i+1] is not "Std".

If I'm iterating over the columns (starting from the end) it would look like:

while i < j:
    if df.loc[1,i] != "Std" and df.loc[1,i] != "Rep":
        if df.loc[1,i+1] != "Std":
            df.insert(i+1,"Std",np.nan)
            df.columns = pd.RangeIndex(df.columns.size)
            df.loc[1,i+1]="Std"
    i+=1
    j = len(df.columns)-1

I'm trying to do with using select but I don't know how to set conditions that depend on two consecutive columns like I did in the while loop.

Expected result:

          0     1    2          3    4          5    6    7          8    9   \
0  Property1   Std  Rep  Property2  Std  Property3  Std  Rep  Property4  Std   
1        1.2   0.1    3        3.1  nan         22  nan    3          6  nan   
2        1.5  0.01    3       18.2  nan         33  nan    3         12  nan   
3        2.6  0.02    3      10.66  nan         44  nan    3      14.23  nan   

          10    11  
0  Property5   Std  
1        3.1     1  
2       18.2   0.2  
3      10.66  0.66  

Is there a way to vectorize this loop?

1 Answer 1

1

Let us try

s = df.iloc[0,:]
s = s.str.startswith('Property')&s.shift(-1).ne('Std')
out = pd.concat([df,pd.DataFrame([['Std',np.nan,np.nan,np.nan]]*s.sum(),
                                 index = s.index[s],
                                 columns=df.index).T],axis=1).sort_index(level=0,axis=1)
out
           0     1    2          3    3  ...    5          6    6          7     8
0  Property1   Std  Rep  Property2  Std  ...  Rep  Property4  Std  Property5   Std
1        1.2   0.1    3        3.1  NaN  ...    3          6  NaN        3.1     1
2        1.5  0.01    3       18.2  NaN  ...    3         12  NaN       18.2   0.2
3        2.6  0.02    3      10.66  NaN  ...    3      14.23  NaN      10.66  0.66
[4 rows x 12 columns]
Sign up to request clarification or add additional context in comments.

1 Comment

could you maybe explain what this is for: ...*s.sum() I tested the code with and without this s.sum() and got the same (good) result

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.