0

Background

I have a numpy.ndarray of shape==(95,15). I already have the desired Series.Index names, of len(my_index)==95. I want to create a Series in which every index is associated with one of the rows of my 95x15 numpy.ndarray.

Variable Names

  • pfit: 95x15 numpy.ndarray
  • my_index: 95x1 list(str)

Steps Taken

  1. The following fails with corresponding error:
my_series = pd.Series(index=my_index, dtype="object", data=pfit)
Traceback (most recent call last):

  File "C:\Users\gford1\AppData\Local\Temp\1/ipykernel_22244/2329315457.py", line 1, in <module>
    my_series = pd.Series(index=my_index, dtype="object", data=pfit)

  File "C:\Users\gford1\AppData\Local\Programs\Spyder\pkgs\pandas\core\series.py", line 439, in __init__
    data = sanitize_array(data, index, dtype, copy)

  File "C:\Users\gford1\AppData\Local\Programs\Spyder\pkgs\pandas\core\construction.py", line 577, in sanitize_array
    subarr = _sanitize_ndim(subarr, data, dtype, index, allow_2d=allow_2d)

  File "C:\Users\gford1\AppData\Local\Programs\Spyder\pkgs\pandas\core\construction.py", line 628, in _sanitize_ndim
    raise ValueError("Data must be 1-dimensional")

ValueError: Data must be 1-dimensional
  1. I therefore have to iterate through my_index and add the pfit arrays, one-by-one:
my_series = pd.Series(index=my_index, dtype="object")
i = 0
for idx in my_series.index:
    my_series[idx] = pfit[i]
    i+=1

#2 works, but I believe that there is a better / faster way that I am unaware of.

7
  • does data=list(pfit)) help? Commented Dec 23, 2021 at 16:13
  • I want to preserve the numpy array Commented Dec 23, 2021 at 16:44
  • I don't think you understand what a Series is, or what your iteration does. Look at my_series.values. Has that "preserved" the array? Commented Dec 23, 2021 at 17:16
  • Not to be argumentative, but I do understand what a Series is. I don't want to needlessly convert a C-style Numpy array to a Python linked list (i.e. list(pfit)), to then only again retrieve it as a Numpy array when I need it. Commented Dec 23, 2021 at 17:23
  • The only way you can preserved a 2d array in a Series is to put it, whole, into one cell. You can't put it, one row at a time, into the Series, and still retrieve it as a 2d array. Commented Dec 23, 2021 at 17:41

1 Answer 1

1
In [283]: pfit=np.arange(12).reshape(3,4)
In [284]: pfit
Out[284]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [285]: my_index=[1,2,3]

Your construct:

In [286]: my_series = pd.Series(index=my_index, dtype="object")
     ...: i = 0
     ...: for idx in my_series.index:
     ...:     my_series[idx] = pfit[i]
     ...:     i+=1
     ...: 
In [287]: my_series
Out[287]: 
1      [0, 1, 2, 3]
2      [4, 5, 6, 7]
3    [8, 9, 10, 11]
dtype: object
In [288]: my_series.values
Out[288]: 
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])],
      dtype=object)

My suggestion produces the same thing:

In [289]: list(pfit)
Out[289]: [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])]
In [290]: S = pd.Series(index=my_index, data=list(pfit))
In [291]: S
Out[291]: 
1      [0, 1, 2, 3]
2      [4, 5, 6, 7]
3    [8, 9, 10, 11]
dtype: object
In [292]: S.values
Out[292]: 
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])],
      dtype=object)

recreating the 2d array:

In [293]: np.stack(S.values)
Out[293]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Dataframe:

In [294]: df = pd.DataFrame(index=my_index, data=pfit)
In [295]: df
Out[295]: 
   0  1   2   3
1  0  1   2   3
2  4  5   6   7
3  8  9  10  11
In [296]: df.values
Out[296]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

===

If I change an element of pfit, the change appears in S and df.

In [305]: pfit[1,1] = 100
In [306]: pfit
Out[306]: 
array([[  0,   1,   2,   3],
       [  4, 100,   6,   7],
       [  8,   9,  10,  11]])
In [307]: S
Out[307]: 
1      [0, 1, 2, 3]
2    [4, 100, 6, 7]
3    [8, 9, 10, 11]
dtype: object
In [308]: df
Out[308]: 
   0    1   2   3
1  0    1   2   3
2  4  100   6   7
3  8    9  10  11

In the df case the data array is used directly, without copy or change, as the _values attribute (or whatever it's called internally).

In the Series case, Out[289] is a list of 1d views of pfit. Hence a change to the 2nd view is appears in both.

But recreating a 2d array from the Series, as done with [293] makes a copy, a new 2d array.

We can see this difference by looking at the arrays ipython display earlier - though understanding what's going on here also requires an understanding of numpy views and object references.

In [309]: Out[292]
Out[309]: 
array([array([0, 1, 2, 3]), array([  4, 100,   6,   7]),
       array([ 8,  9, 10, 11])], dtype=object)
In [310]: Out[293]
Out[310]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.