numpy concatenate not appending new array to empty multidimensional array

Question

I bet I am doing something very simple wrong. I want to start with an empty 2D numpy array and append arrays to it (with dimensions 1 row by 4 columns).

open_cost_mat_train = np.matrix([])

for i in xrange(10):
    open_cost_mat = np.array([i,0,0,0])
    open_cost_mat_train = np.vstack([open_cost_mat_train,open_cost_mat])

my error trace is:

  File "/Users/me/anaconda/lib/python2.7/site-packages/numpy/core/shape_base.py", line 230, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

What am I doing wrong? I have tried append, concatenate, defining the empty 2D array as [[]], as [], array([]) and many others.

It is better to construct a list of arrays and apply vstack just once. Repeated concatenation is slow. — hpaulj
– hpaulj, Commented Jul 26, 2016 at 21:02

akuiper · Accepted Answer · 2016-07-26 17:53:59Z

3

You need to reshape your original matrix so that the number of columns match the appended arrays:

open_cost_mat_train = np.matrix([]).reshape((0,4))

After which, it gives:

open_cost_mat_train

# matrix([[ 0.,  0.,  0.,  0.],
#         [ 1.,  0.,  0.,  0.],
#         [ 2.,  0.,  0.,  0.],
#         [ 3.,  0.,  0.,  0.],
#         [ 4.,  0.,  0.,  0.],
#         [ 5.,  0.,  0.,  0.],
#         [ 6.,  0.,  0.,  0.],
#         [ 7.,  0.,  0.,  0.],
#         [ 8.,  0.,  0.,  0.],
#         [ 9.,  0.,  0.,  0.]])

answered Jul 26, 2016 at 17:53

akuiper

216k33 gold badges363 silver badges380 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tonechas · Accepted Answer · 2016-07-27 09:16:56Z

2

If open_cost_mat_train is large I would encourage you to replace the for loop by a vectorized algorithm. I will use the following funtions to show how efficiency is improved by vectorizing loops:

def fvstack():
    import numpy as np
    np.random.seed(100)
    ocmt = np.matrix([]).reshape((0, 4))
    for i in xrange(10):
        x = np.random.random()
        ocm = np.array([x, x + 1, 10*x, x/10])
        ocmt = np.vstack([ocmt, ocm])
    return ocmt

def fshape():
    import numpy as np
    from numpy.matlib import empty
    np.random.seed(100)
    ocmt = empty((10, 4))
    for i in xrange(ocmt.shape[0]):
        ocmt[i, 0] = np.random.random()
    ocmt[:, 1] = ocmt[:, 0] + 1
    ocmt[:, 2] = 10*ocmt[:, 0]
    ocmt[:, 3] = ocmt[:, 0]/10
    return ocmt

I've assumed that the values that populate the first column of ocmt (shorthand for open_cost_mat_train) are obtained from a for loop, and the remaining columns are a function of the first column, as stated in your comments to my original answer. As real costs data are not available, in the forthcoming example the values in the first column are random numbers, and the second, third and fourth columns are the functions x + 1, 10*x and x/10, respectively, where x is the corresponding value in the first column.

In [594]: fvstack()
Out[594]: 
matrix([[  5.43404942e-01,   1.54340494e+00,   5.43404942e+00,   5.43404942e-02],
        [  2.78369385e-01,   1.27836939e+00,   2.78369385e+00,   2.78369385e-02],
        [  4.24517591e-01,   1.42451759e+00,   4.24517591e+00,   4.24517591e-02],
        [  8.44776132e-01,   1.84477613e+00,   8.44776132e+00,   8.44776132e-02],
        [  4.71885619e-03,   1.00471886e+00,   4.71885619e-02,   4.71885619e-04],
        [  1.21569121e-01,   1.12156912e+00,   1.21569121e+00,   1.21569121e-02],
        [  6.70749085e-01,   1.67074908e+00,   6.70749085e+00,   6.70749085e-02],
        [  8.25852755e-01,   1.82585276e+00,   8.25852755e+00,   8.25852755e-02],
        [  1.36706590e-01,   1.13670659e+00,   1.36706590e+00,   1.36706590e-02],
        [  5.75093329e-01,   1.57509333e+00,   5.75093329e+00,   5.75093329e-02]])

In [595]: np.allclose(fvstack(), fshape())
Out[595]: True

In order for the calls to fvstack() and fshape() produce the same results, the random number generator is initialized in both functions through np.random.seed(100). Notice that the equality test has been performed using numpy.allclose instead of fvstack() == fshape() to avoid the round off errors associated to floating point artihmetic.

As for efficiency, the following interactive session shows that initializing ocmt with its final shape is significantly faster than repeatedly stacking rows:

In [596]: import timeit

In [597]: timeit.timeit('fvstack()', setup="from __main__ import fvstack", number=10000)
Out[597]: 1.4884241055042366

In [598]: timeit.timeit('fshape()', setup="from __main__ import fshape", number=10000)
Out[598]: 0.8819408006311278

edited Jul 27, 2016 at 9:16

answered Jul 26, 2016 at 19:29

Tonechas

13.8k16 gold badges52 silver badges85 bronze badges

3 Comments

Dhruv Ghulati Over a year ago

I gave the arange(n) as an example, but in reality the matrix will be obtaining values from a for loop which obtains data of real "costs" in a cost-sensitive classifier.

Dhruv Ghulati Over a year ago

What happens if the zero columns are some function of the first column? Will this method still speed things up?

Tonechas Over a year ago

Yes, it will. I have edited again my answer to show you how vectorial code improves speed in your application.

Collectives™ on Stack Overflow

numpy concatenate not appending new array to empty multidimensional array

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related