How to handle Series and Array with pandas and numpy together?

Question

I am new to Python and I am very confused with all these data type such as Series, Array, List etc. Probably this is a very open ended question. I am hoping to get a feel on the general practice when coding in python for data analysis.

Lots of readings have been suggesting that numpy and pandas are the two modules I needed for data analysis. However, I find it hard and weird as they are operating/generating data in two different data types, i.e. Series and Array. Is it normal/natural that one needs to convert either one of the data type to another one before any kind of data manipulation? Would like you know what would you do? Many thanks.

for example:

 import pandas as pd
 import numpy as np

 # create some data
 df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'c'])
 x = np.random.randn(10, 1)

 # data manipulation
 A = df['a']

 # Question 1:
 # If I want to perform a element by element addition between x and A
 # How should I do?  Simple x + A doesn't work but it seems strange to 
 # me that if I have to convert the data type everytime 

 # Question 2:
 # I'd like to combine to two columns together
 # concatenate or hstack both don't work

What are you wanting to get numpy.arrays or pd.Series and pd.Dataframes? — Anton Protopopov
– Anton Protopopov, Commented Feb 2, 2016 at 8:06
I presume I would want dataframe at the end as I start with dataframe (since i import data using pandas). Basically, I find them not compatible with each other (the two modules) which is annoying and wondering if i am in the right direction (require an extra step/function in almost each operation). — Lafayette
– Lafayette, Commented Feb 2, 2016 at 8:50

Anton Protopopov · Accepted Answer · 2016-02-02 09:20:28Z

2

For addition your arrays/Series should be with the same dimensions:

In [98]: A.shape
Out[98]: (10,)

In [99]: x.shape
Out[99]: (10, 1)

You could cast reshape(-1) to convert your vector to array:

In [100]: x.reshape(-1).shape
Out[100]: (10,)

Then you could add that with pd.Series A:

In [61]: A + x.reshape(-1)
Out[61]:
0   -1.186957
1   -0.165563
2    0.882490
3    4.544357
4    2.698414
5    0.396110
6   -0.199209
7    3.282942
8    2.448213
9   -0.543727
Name: a, dtype: float64

For your 2nd question you need to reshape your A Series for the vector. You could do it with reshape:

In [97]: np.hstack([A.values.reshape(A.size,1), x])
Out[97]:
array([[ 0.3158111 , -1.50276813],
       [-1.09532212,  0.92975954],
       [-0.77048623,  1.65297592],
       [ 2.14690242,  2.39745455],
       [ 1.63367806,  1.06473634],
       [ 0.09134512,  0.3047644 ],
       [ 0.02019805, -0.21940726],
       [ 0.87008192,  2.41286007],
       [ 1.25315724,  1.19505578],
       [-0.60156045,  0.05783343]])

If you want to get pd.DataFrame you could use pd.concat:

In [108]: pd.concat([A, pd.Series(x.reshape(-1))], axis=1)
Out[108]:
          a         0
0  0.315811 -1.502768
1 -1.095322  0.929760
2 -0.770486  1.652976
3  2.146902  2.397455
4  1.633678  1.064736
5  0.091345  0.304764
6  0.020198 -0.219407
7  0.870082  2.412860
8  1.253157  1.195056
9 -0.601560  0.057833

EDIT

From docs for reshape(-1):

newshape : int or tuple of ints
The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.

edited Feb 2, 2016 at 9:20

answered Feb 2, 2016 at 8:07

Anton Protopopov

31.9k13 gold badges93 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Lafayette Over a year ago

what does .reshape(-1) do/mean? Thanks

Anton Protopopov Over a year ago

Edited answer for that

Anton Protopopov Over a year ago

@Lafayette note, that reshape(-1) will work for any shape of your original array while reshape(10) will be acceptable only for vector with 10 size.

Stop harming Monica · Accepted Answer · 2016-02-02 09:19:50Z

Is it normal/natural that one needs to convert either one of the data type to another one before any kind of data manipulation?

Sometimes you need to, sometimes you don't. When in doubt, do it.

That said, remember the Zen of Python:

Explicit is better than implicit.
In the face of ambiguity, refuse the temptation to guess.

Even if some APIs will do their best to convert types for you (numpy and pandas are quite good at that), explicit type casting can make your code more readable and easier to debug.

Question 1: If I want to perform a element by element addition between x and A How should I do? Simple x + A doesn't work but it seems strange to me that if I have to convert the data type everytime

You do not have to convert data types in this case but you need compatible shapes.

>>> print(A.shape)
(10,)
>>> print(x.shape)
(10, 1)
>>> print(A + x.reshape(10))
0   -0.207131
1   -2.117012
2    0.925545
3   -2.187705
4    1.226458
5    2.144904
6   -0.956781
7    1.956246
8    0.060132
9    1.332417
Name: a, dtype: float64

Question 2: I'd like to combine to two columns together concatenate or hstack both don't work

It is not clear what the desired output is but I think it is again a matter of shapes, not types. Here is an option the pandas way:

>>> print(pd.concat([A, pd.Series(x.reshape(10))], axis=1))
          a         0
0 -0.158667 -0.048463
1 -0.847246 -1.269765
2 -0.128232  1.053778
3 -1.316113 -0.871593
4  1.057044  0.169414
5  3.188343 -1.043439
6 -0.032524 -0.924257
7  1.412443  0.543803
8 -0.730386  0.790519
9  0.289796  1.042621

Collectives™ on Stack Overflow

How to handle Series and Array with pandas and numpy together?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related