Performing calculations on a numpy array and adding them to a pandas dataframe

Question

Let's say I have an array such as this:

a = np.array([[1, 2, 3, 4, 5, 6, 7], [20, 25, 30, 35, 40, 45, 50], [2, 4, 6, 8, 10, 12, 14]])

and a dataframe such as this:

  num letter
0   1      a
1   2      b
2   3      c

What I would then like to do is to calculate the difference between the first and last number in each sequence in the array and ultimately add this difference to a new column in the df.

Currently I am able to calculate the desired difference in each sequence in this manner:

for i in a:
    print(i[-1] - i[0])

Giving me the following results:

6
30
12

I would expect to be able to do is replace the print with df['new_col'] like so:

df['new_col'] = (i[-1] - i[0])

And for my df to then look like this:

  num letter new_col
0   1      a      6
1   2      b      30
2   3      c      12

However, I end up getting this:

  num letter  new_col
0   1      a       12
1   2      b       12
2   3      c       12

I would also really appreciate if anyone could tell me what the equivalent of .diff() and .shift() are in numpy as I tried that in the same way you would with a pandas dataframe as well but just got error messages. This would be useful for me if I want to calculate the difference not just between the first and last numbers but somewhere in between.

Any help would be really appreciated, cheers.

Hi guys, sorry about this, in my haste to ask the question I asked it with a slight inaccuracy which is causing me an issue. My df is actually longer, than the array. Lets say my df is actually 4 rows long ` num letter 0 1 a 1 2 b 2 3 c 3 4 d` When trying to perform the code in your answer I get the error message ValueError: Length of values does not match length of index It works perfectly when my df and array are the same number of rows but not otherwise. I just want nans to appear where the array has no number to give. Many thanks — top bantz
– top bantz, Commented Apr 1, 2019 at 13:00

Zulfiqaar · Accepted Answer · 2019-04-01 15:39:36Z

2

currently you are only performing the difference calculation in the very last one

use a list comprehension:

a = np.array([[1, 2, 3, 4, 5, 6, 7], [20, 25, 30, 35, 40, 45, 50], [2, 4, 6, 8, 10, 12, 14]])

b = [i[-1] - i[0] for i in a]

if the lengths mismatch, then you need to extend the list with NaNs:

b = b + [np.NaN]*(len(df) - len(b))
df['new_col'] = b

edited Apr 1, 2019 at 15:39

answered Apr 1, 2019 at 12:04

Zulfiqaar

6431 gold badge6 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

top bantz Over a year ago

Thanks for the answer mate

top bantz Over a year ago

Hi mate, sorry to be a pain, but I actually asked something slightly wrong in my question, I added a comment under my original question explaining, I just wondered if you knew how to solve it? Thanks

gold_cy · Accepted Answer · 2019-04-01 12:16:55Z

1

Might be better off doing this in a DataFrame if your array grows in size.

df1 = pd.DataFrame(a.T)

df['new_col'] = df1.iloc[-1] - df1.iloc[0]

print(df)

   num letter  new_col
0    1      a        6
1    2      b       30
2    3      c       12

answered Apr 1, 2019 at 12:16

gold_cy

14.2k4 gold badges27 silver badges55 bronze badges

Collectives™ on Stack Overflow

Performing calculations on a numpy array and adding them to a pandas dataframe

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related