Assigning multi-dimensional Numpy Array to a Pandas Series

Question

Background

I have a numpy.ndarray of shape==(95,15). I already have the desired Series.Index names, of len(my_index)==95. I want to create a Series in which every index is associated with one of the rows of my 95x15 numpy.ndarray.

Variable Names

pfit: 95x15 numpy.ndarray
my_index: 95x1 list(str)

Steps Taken

The following fails with corresponding error:

my_series = pd.Series(index=my_index, dtype="object", data=pfit)
Traceback (most recent call last):

  File "C:\Users\gford1\AppData\Local\Temp\1/ipykernel_22244/2329315457.py", line 1, in <module>
    my_series = pd.Series(index=my_index, dtype="object", data=pfit)

  File "C:\Users\gford1\AppData\Local\Programs\Spyder\pkgs\pandas\core\series.py", line 439, in __init__
    data = sanitize_array(data, index, dtype, copy)

  File "C:\Users\gford1\AppData\Local\Programs\Spyder\pkgs\pandas\core\construction.py", line 577, in sanitize_array
    subarr = _sanitize_ndim(subarr, data, dtype, index, allow_2d=allow_2d)

  File "C:\Users\gford1\AppData\Local\Programs\Spyder\pkgs\pandas\core\construction.py", line 628, in _sanitize_ndim
    raise ValueError("Data must be 1-dimensional")

ValueError: Data must be 1-dimensional

I therefore have to iterate through my_index and add the pfit arrays, one-by-one:

my_series = pd.Series(index=my_index, dtype="object")
i = 0
for idx in my_series.index:
    my_series[idx] = pfit[i]
    i+=1

#2 works, but I believe that there is a better / faster way that I am unaware of.

I don't think you understand what a Series is, or what your iteration does. Look at my_series.values. Has that "preserved" the array? — hpaulj
– hpaulj, Commented Dec 23, 2021 at 17:16
Not to be argumentative, but I do understand what a Series is. I don't want to needlessly convert a C-style Numpy array to a Python linked list (i.e. list(pfit)), to then only again retrieve it as a Numpy array when I need it. — trozzel
– trozzel, Commented Dec 23, 2021 at 17:23
The only way you can preserved a 2d array in a Series is to put it, whole, into one cell. You can't put it, one row at a time, into the Series, and still retrieve it as a 2d array. — hpaulj
– hpaulj, Commented Dec 23, 2021 at 17:41

hpaulj · Accepted Answer · 2021-12-23 19:23:27Z

In [283]: pfit=np.arange(12).reshape(3,4)
In [284]: pfit
Out[284]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [285]: my_index=[1,2,3]

Your construct:

In [286]: my_series = pd.Series(index=my_index, dtype="object")
     ...: i = 0
     ...: for idx in my_series.index:
     ...:     my_series[idx] = pfit[i]
     ...:     i+=1
     ...: 
In [287]: my_series
Out[287]: 
1      [0, 1, 2, 3]
2      [4, 5, 6, 7]
3    [8, 9, 10, 11]
dtype: object
In [288]: my_series.values
Out[288]: 
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])],
      dtype=object)

My suggestion produces the same thing:

In [289]: list(pfit)
Out[289]: [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])]
In [290]: S = pd.Series(index=my_index, data=list(pfit))
In [291]: S
Out[291]: 
1      [0, 1, 2, 3]
2      [4, 5, 6, 7]
3    [8, 9, 10, 11]
dtype: object
In [292]: S.values
Out[292]: 
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])],
      dtype=object)

recreating the 2d array:

In [293]: np.stack(S.values)
Out[293]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Dataframe:

In [294]: df = pd.DataFrame(index=my_index, data=pfit)
In [295]: df
Out[295]: 
   0  1   2   3
1  0  1   2   3
2  4  5   6   7
3  8  9  10  11
In [296]: df.values
Out[296]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

===

If I change an element of pfit, the change appears in S and df.

In [305]: pfit[1,1] = 100
In [306]: pfit
Out[306]: 
array([[  0,   1,   2,   3],
       [  4, 100,   6,   7],
       [  8,   9,  10,  11]])
In [307]: S
Out[307]: 
1      [0, 1, 2, 3]
2    [4, 100, 6, 7]
3    [8, 9, 10, 11]
dtype: object
In [308]: df
Out[308]: 
   0    1   2   3
1  0    1   2   3
2  4  100   6   7
3  8    9  10  11

In the df case the data array is used directly, without copy or change, as the _values attribute (or whatever it's called internally).

In the Series case, Out[289] is a list of 1d views of pfit. Hence a change to the 2nd view is appears in both.

But recreating a 2d array from the Series, as done with [293] makes a copy, a new 2d array.

We can see this difference by looking at the arrays ipython display earlier - though understanding what's going on here also requires an understanding of numpy views and object references.

In [309]: Out[292]
Out[309]: 
array([array([0, 1, 2, 3]), array([  4, 100,   6,   7]),
       array([ 8,  9, 10, 11])], dtype=object)
In [310]: Out[293]
Out[310]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Collectives™ on Stack Overflow

Assigning multi-dimensional Numpy Array to a Pandas Series

Background

Variable Names

Steps Taken

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Background

Variable Names

Steps Taken

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related