Modifying columns from existing data frame into new data frame

Question

I am trying to create a new data frame that compresses pre-existing columns from another data frame.

I am looking to turn something like this:

id | x1  | x2  | x3  | x4
-------------------------- ...
a  | x1a | x2a | x3a | x4a
b  | x1b | x2b | x3b | x4b
c  | x1c | x2c | x3c | x4c

Into this:

id |     z1       |      z2
-------------------------------- ...
a  | f1(x1a, x2a) | f2(x3a, x4a) 
b  | f1(x1b, x2b) | f2(x3b, x4b) 
c  | f1(x1c, x2c) | f2(x3c, x4c)

My current approach has been to continuously just append row by row to the new data frame. Like so:

for row in rows:
   new_row_map = get_new_row_map(df_in, row)
   df_out = df_out.append(new_row_map, ignore_index=True) 
return df_out

I have been running this code for a couple hours now and it seems to be very inefficient. I was wondering if anyone had a quicker/more efficient approach here. Thanks!

Where is x4a, x4b, x4c?

Corralien
– Corralien

2022-07-11 21:35:34 +00:00
Commented Jul 11, 2022 at 21:35 — Corralien
– Corralien, Commented Jul 11, 2022 at 21:35
Sorry, second inputs to f2 function

Hnorth
– Hnorth

2022-07-11 21:39:59 +00:00
Commented Jul 11, 2022 at 21:39 — Hnorth
– Hnorth, Commented Jul 11, 2022 at 21:39
You have 2 different functions?

Corralien
– Corralien

2022-07-11 21:46:20 +00:00
Commented Jul 11, 2022 at 21:46 — Corralien
– Corralien, Commented Jul 11, 2022 at 21:46
Yeah they are different functions

Hnorth
– Hnorth

2022-07-11 21:49:26 +00:00
Commented Jul 11, 2022 at 21:49 — Hnorth
– Hnorth, Commented Jul 11, 2022 at 21:49

luke · Accepted Answer · 2022-07-11 21:57:17Z

You're right, appending row by row to a data is very inefficient, which is why pandas and numpy use vectorized operations to alter and access their data. Data types in numpy and pandas are stored with less metadata than they would be in a base python type, and vectorized operations allow all the calculations to be done at once (for every element) rather than iterating sequentially through each row. See Chapter 4 of Python for Data Analysis for a more thorough explanation (it's free online).

Rather than appending row by row, you need to apply a vectorized function to the whole data frame (meaning it alters the entire data frame at once instead of iterating over the rows). For instance:

df["z1"] = f1(df)
df["z2"] = f2(df)

#examples of what f1 and f2 could be
def f1(df):
    result = (df["x1"] * df["x2"] + 4) + np.cos(df["x2"]))
    return result

def f2(df):
    df["x3"] - df["x4"] * 9.8

# you could cut out the original columns like so
df = df[["z1", "z2"]]

See this post about vectorizing a function, and this article

Corralien · Accepted Answer · 2022-07-11 22:02:34Z

1

You can use:

def f1(row):
    # do stuff here, just return a string for demo
    return f"f({', '.join(row)})"
    
def f2(row):
    # do stuff here, just return a string for demo
    return f"f({', '.join(row)})"

df['z1'] = df[['x1', 'x2']].apply(f1, axis=1)
df['z2'] = df[['x3', 'x4']].apply(f2, axis=1)

Output:

  id   x1   x2   x3   x4           z1           z2
0  a  x1a  x2a  x3a  x4a  f(x1a, x2a)  f(x3a, x4a)
1  b  x1b  x2b  x3b  x4b  f(x1b, x2b)  f(x3b, x4b)
2  c  x1c  x2c  x3c  x4c  f(x1c, x2c)  f(x3c, x4c)

edited Jul 11, 2022 at 22:02

answered Jul 11, 2022 at 21:55

Corralien

121k8 gold badges44 silver badges69 bronze badges

Collectives™ on Stack Overflow

Modifying columns from existing data frame into new data frame

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related