numpy - how to add a value to every element in the first column of an array?

Question

I have an array like this:

array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
       ('6601', 2.2452745388799898e-27, 0.99999999995270605),
       ('21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
       ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
       ('45164198', 1.09470356446595e-24, 0.99999999997635303),
       ('45164519', 3.7521365799080699e-24, 0.99999999997453404)], 
      dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])

And I want to turn it into this: (adding a prefix '2R' onto each value in the first column)

array([('2R:6506', 4.6725971801473496e-25, 0.99999999995088695),
       ('2R:6601', 2.2452745388799898e-27, 0.99999999995270605),
       ('2R:21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
       ('2R:45164194', 1.0413482803123399e-24, 0.99999999997453404),
       ('2R:45164198', 1.09470356446595e-24, 0.99999999997635303),
       ('2R:45164519', 3.7521365799080699e-24, 0.99999999997453404)], 
      dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])

I looked up some stuff about nditer (but I want to support earlier versions of numpy.) Also I'm reading one should avoid iteration.

I was thinking I could reference just the first column and apply a function on that, but I'm getting index error: invalid index when I try this: array[:,0] — Greg
– Greg, Commented May 6, 2014 at 14:42
With a np.rec.array you would be able to access those columns using array["pos"]. But I don't know how to add anything in the "string addition broadcasting" manner you are looking for. — eickenberg
– eickenberg, Commented May 6, 2014 at 14:45
hmm, I can access the first column with array['pos'] but I'm not sure how to modify the values from there. (assuming that's the right direction) — Greg
– Greg, Commented May 6, 2014 at 14:49
possible duplicate of Element-wise string concatenation in numpy — Saullo G. P. Castro
– Saullo G. P. Castro, Commented May 6, 2014 at 14:56

falsetru · Accepted Answer · 2022-10-28 06:59:13Z

6

Using numpy.core.defchararray.add:

>>> from numpy import array
>>> from numpy.core.defchararray import add
>>>
>>> xs = array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
...             ('6601', 2.2452745388799898e-27, 0.99999999995270605),
...             ('21801', 1.9849650921836601e-31, 0.99999999997999001),
...             ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
...             ('45164198', 1.09470356446595e-24, 0.99999999997635303),
...             ('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
...            dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
>>> xs['pos'] = add('2R:', xs['pos'])
>>> xs
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
       ('2R:6601', 2.24527453887999e-27, 0.999999999952706),
       ('2R:21801', 1.98496509218366e-31, 0.99999999997999),
       ('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
       ('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
       ('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
      dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])

UPDATE: You can use num.char.add instead of numpy.core.defchararray.add (commented by @joel-buursma):

>>> import numpy
>>> numpy.char == numpy.core.defchararray
True

edited Oct 28, 2022 at 6:59

answered May 6, 2014 at 14:53

falsetru

371k69 gold badges769 silver badges659 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

eickenberg Over a year ago

Impressive solution. If we were to append (instead of prepend) '2R:', would this function handle its task significantly differently according to whether 'S100' leaves enough space or not? (Imagine for example that the number has 98 digits.)

falsetru Over a year ago

@eickenberg, According to an experiment (pastebin.com/dVRyJmQH), add (both prepend/append) truncate trailing parts according to the size specified.

Joel Buursma Over a year ago

You can now use np.char instead of numpy.core.defchararray. (numpy.org/doc/stable/reference/generated/numpy.char.add.html)

falsetru Over a year ago

@JoelBuursma, Thank you for the information. I updated the answer according to your comment.

JoshAdel · Accepted Answer · 2014-05-06 14:59:31Z

2

A simple (albeit perhaps not optimal) solution is just:

a = np.array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
       ('6601', 2.2452745388799898e-27, 0.99999999995270605),
       ('21801', 1.9849650921836601e-31, 0.99999999997999001),
       ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
       ('45164198', 1.09470356446595e-24, 0.99999999997635303),
       ('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
      dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])


a['pos'] = [''.join(('2R:',x)) for x in a['pos']]

In [11]: a
Out[11]:
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
       ('2R:6601', 2.24527453887999e-27, 0.999999999952706),
       ('2R:21801', 1.98496509218366e-31, 0.99999999997999),
       ('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
       ('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
       ('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
      dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])

While I like @falsetru's answer for using core numpy routines, surprisingly, list comprehension seems a bit faster:

In [19]: a = np.empty(20000, dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])

In [20]: %timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
100 loops, best of 3: 11.1 ms per loop

In [21]: %timeit a['pos'] = add('2R:', a['pos'])
100 loops, best of 3: 15.7 ms per loop

Definitely benchmark your own use case and hardware to see which makes more sense for your actual application though. One of the things I've learned is that in certain situations, basic python constructs can outperform numpy built-ins, depending on the task at hand.

edited May 6, 2014 at 14:59

answered May 6, 2014 at 14:52

JoshAdel

69.1k27 gold badges146 silver badges146 bronze badges

2 Comments

Greg Over a year ago

Interesting results in timing it. Thanks!

Gaurav Singhal Over a year ago

both the solutions throw errors for me: join : TypeError: sequence item 1: expected str instance, numpy.bytes_ found and add: TypeError: must be str, not numpy.bytes_

Gaurav Singhal · Accepted Answer · 2018-09-18 13:30:41Z

1

Another slightly faster solution is to use list comprehension with + operator. Though I do not understand why it is faster. But it is definitely very elegant and basic.

a['pos'] = ["2R:" + x for x in a['pos']]

Timings:

%timeit a['pos'] = ["2R:" + x for x in a['pos']]
8.07 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
9.53 ms ± 391 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit a['pos'] = add('2R:', a['pos'])
14.2 ms ± 337 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

PS: I created the array a using slightly different definition:

a = np.empty(20000, dtype=[('pos', 'U5'), ('par1', '<f8'), ('par2', '<f8')])

as if I use type Sxxx for pos, concatenation produces a type error for me.

edited Sep 18, 2018 at 13:30

answered Sep 18, 2018 at 11:07

Gaurav Singhal

1,1262 gold badges11 silver badges28 bronze badges

1 Comment

Archie Over a year ago

Note that you end up with a list in this case, you need to convert a to a numpy array again if needed: a = np.asarray(a).

Collectives™ on Stack Overflow

numpy - how to add a value to every element in the first column of an array?

3 Answers 3

4 Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related