0

I have two dataframes whereby the data in df is rewritten to dh in a different row (relationship between x and p is arbitrary). Each row in both frames has the same floating data type arranged horizontally as variables A B C D E F. I did this originally with:

dh.iloc[p,A] = df.iloc[x,A]
dh.iloc[p,B] = df.iloc[x,B]
dh.iloc[p,C] = df.iloc[x,C]
dh.iloc[p,D] = df.iloc[x,D]
dh.iloc[p,E] = df.iloc[x,E]
dh.iloc[p,F] = df.iloc[x,F]

It occurs to me that this is 6 discrete accesses to 6 discrete locations, and might be faster if I could write into all 6 at the same time. Is there any way to do this in a single statement that would execute faster?

import pandas as pd
import DataFrame as df
x=0
p=0
GDZ=0
while x < 1000:
    if GDZ == 0:
        ZroTst = df.iloc[x, Zerstng]
        if ZroTst == 'ZZ':
            GDZ = 1
            x +=1
        else:
            x +=1

    else:
        ZroTst = df.iloc[x,Zerstng]
        if ZroTst == 'BB':
            GDZ = 0
            x +=1
        else:
            p +=1            
            dh.iloc[p,A] = df.iloc[x,A]
            dh.iloc[p,B] = df.iloc[x,B]
            dh.iloc[p,C] = df.iloc[x,C]
            dh.iloc[p,D] = df.iloc[x,D]
            dh.iloc[p,E] = df.iloc[x,E]
            dh.iloc[p,F] = df.iloc[x,F]
            x +=1

x +=1

1 Answer 1

2
  1. Do not worry about faster until you have written a program, it's too slow, and you've profiled and timed it to figure out what is making it slow.

  2. It looks like you can do dh.iloc[p, [A, B, C, D, E, F]] = df.iloc[x, [A, B, C, D, E, F]].

  3. You mention that p and x are a row. It's somewhat unusual to do something to one row in pandas, much more common to do something to many rows. Is this in some loop over p,x pairs? If so, there is probably a nicer way to write it.

Sign up to request clarification or add additional context in comments.

3 Comments

dh.iloc[p, [A, B, C, D, E, F]] = df.iloc[x, [A, B, C, D, E, F]] appears to be acceptable from a syntax perspective but doesn't seem to work as hoped.
"Doesn't seem to work" is never actionable information. Can you post an SSCCE sscce.org showing runnable code that does what you want so that we are completely concrete here?
I have a dataframe ‘df’ 6 columns wide, 300k+ long. The data has random occurrences of “bad data” - writing to dataframe ‘dh’ with a different index facilitates the removal of gaps. I currently use 6 individual “write” statements as depicted above that write data to a row in ‘dh’ but it is very slow to execute. The suggested statement: dh.iloc[p, [A, B, C, D, E, F]] = df.iloc[x, [A, B, C, D, E, F]] doesn’t actually write ‘horizontally’ to a row in ‘dh’ as expected – instead populating only one column in ‘dh’ with NaN.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.