1

Let's say I have an array such as this:

a = np.array([[1, 2, 3, 4, 5, 6, 7], [20, 25, 30, 35, 40, 45, 50], [2, 4, 6, 8, 10, 12, 14]])

and a dataframe such as this:

  num letter
0   1      a
1   2      b
2   3      c

What I would then like to do is to calculate the difference between the first and last number in each sequence in the array and ultimately add this difference to a new column in the df.

Currently I am able to calculate the desired difference in each sequence in this manner:

for i in a:
    print(i[-1] - i[0])

Giving me the following results:

6
30
12

I would expect to be able to do is replace the print with df['new_col'] like so:

df['new_col'] = (i[-1] - i[0])

And for my df to then look like this:

  num letter new_col
0   1      a      6
1   2      b      30
2   3      c      12

However, I end up getting this:

  num letter  new_col
0   1      a       12
1   2      b       12
2   3      c       12

I would also really appreciate if anyone could tell me what the equivalent of .diff() and .shift() are in numpy as I tried that in the same way you would with a pandas dataframe as well but just got error messages. This would be useful for me if I want to calculate the difference not just between the first and last numbers but somewhere in between.

Any help would be really appreciated, cheers.

1
  • Hi guys, sorry about this, in my haste to ask the question I asked it with a slight inaccuracy which is causing me an issue. My df is actually longer, than the array. Lets say my df is actually 4 rows long ` num letter 0 1 a 1 2 b 2 3 c 3 4 d` When trying to perform the code in your answer I get the error message ValueError: Length of values does not match length of index It works perfectly when my df and array are the same number of rows but not otherwise. I just want nans to appear where the array has no number to give. Many thanks Commented Apr 1, 2019 at 13:00

2 Answers 2

2

currently you are only performing the difference calculation in the very last one

use a list comprehension:

a = np.array([[1, 2, 3, 4, 5, 6, 7], [20, 25, 30, 35, 40, 45, 50], [2, 4, 6, 8, 10, 12, 14]])

b = [i[-1] - i[0] for i in a]

if the lengths mismatch, then you need to extend the list with NaNs:

b = b + [np.NaN]*(len(df) - len(b))
df['new_col'] = b
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer mate
Hi mate, sorry to be a pain, but I actually asked something slightly wrong in my question, I added a comment under my original question explaining, I just wondered if you knew how to solve it? Thanks
1

Might be better off doing this in a DataFrame if your array grows in size.

df1 = pd.DataFrame(a.T)

df['new_col'] = df1.iloc[-1] - df1.iloc[0]

print(df)

   num letter  new_col
0    1      a        6
1    2      b       30
2    3      c       12

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.