Faster way to write to dataframe across multiple columns?

Question

I have two dataframes whereby the data in df is rewritten to dh in a different row (relationship between x and p is arbitrary). Each row in both frames has the same floating data type arranged horizontally as variables A B C D E F. I did this originally with:

dh.iloc[p,A] = df.iloc[x,A]
dh.iloc[p,B] = df.iloc[x,B]
dh.iloc[p,C] = df.iloc[x,C]
dh.iloc[p,D] = df.iloc[x,D]
dh.iloc[p,E] = df.iloc[x,E]
dh.iloc[p,F] = df.iloc[x,F]

It occurs to me that this is 6 discrete accesses to 6 discrete locations, and might be faster if I could write into all 6 at the same time. Is there any way to do this in a single statement that would execute faster?

import pandas as pd
import DataFrame as df
x=0
p=0
GDZ=0
while x < 1000:
    if GDZ == 0:
        ZroTst = df.iloc[x, Zerstng]
        if ZroTst == 'ZZ':
            GDZ = 1
            x +=1
        else:
            x +=1

    else:
        ZroTst = df.iloc[x,Zerstng]
        if ZroTst == 'BB':
            GDZ = 0
            x +=1
        else:
            p +=1            
            dh.iloc[p,A] = df.iloc[x,A]
            dh.iloc[p,B] = df.iloc[x,B]
            dh.iloc[p,C] = df.iloc[x,C]
            dh.iloc[p,D] = df.iloc[x,D]
            dh.iloc[p,E] = df.iloc[x,E]
            dh.iloc[p,F] = df.iloc[x,F]
            x +=1

x +=1

Mike Graham · Accepted Answer · 2016-02-13 17:59:57Z

2

Do not worry about faster until you have written a program, it's too slow, and you've profiled and timed it to figure out what is making it slow.
It looks like you can do dh.iloc[p, [A, B, C, D, E, F]] = df.iloc[x, [A, B, C, D, E, F]].
You mention that p and x are a row. It's somewhat unusual to do something to one row in pandas, much more common to do something to many rows. Is this in some loop over p,x pairs? If so, there is probably a nicer way to write it.

answered Feb 13, 2016 at 17:59

Mike Graham

77.2k16 gold badges105 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

DyTech Over a year ago

dh.iloc[p, [A, B, C, D, E, F]] = df.iloc[x, [A, B, C, D, E, F]] appears to be acceptable from a syntax perspective but doesn't seem to work as hoped.

Mike Graham Over a year ago

"Doesn't seem to work" is never actionable information. Can you post an SSCCE sscce.org showing runnable code that does what you want so that we are completely concrete here?

DyTech Over a year ago

I have a dataframe ‘df’ 6 columns wide, 300k+ long. The data has random occurrences of “bad data” - writing to dataframe ‘dh’ with a different index facilitates the removal of gaps. I currently use 6 individual “write” statements as depicted above that write data to a row in ‘dh’ but it is very slow to execute. The suggested statement: dh.iloc[p, [A, B, C, D, E, F]] = df.iloc[x, [A, B, C, D, E, F]] doesn’t actually write ‘horizontally’ to a row in ‘dh’ as expected – instead populating only one column in ‘dh’ with NaN.

Collectives™ on Stack Overflow

Faster way to write to dataframe across multiple columns?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related