7

I have an array like this:

array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
       ('6601', 2.2452745388799898e-27, 0.99999999995270605),
       ('21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
       ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
       ('45164198', 1.09470356446595e-24, 0.99999999997635303),
       ('45164519', 3.7521365799080699e-24, 0.99999999997453404)], 
      dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])

And I want to turn it into this: (adding a prefix '2R' onto each value in the first column)

array([('2R:6506', 4.6725971801473496e-25, 0.99999999995088695),
       ('2R:6601', 2.2452745388799898e-27, 0.99999999995270605),
       ('2R:21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
       ('2R:45164194', 1.0413482803123399e-24, 0.99999999997453404),
       ('2R:45164198', 1.09470356446595e-24, 0.99999999997635303),
       ('2R:45164519', 3.7521365799080699e-24, 0.99999999997453404)], 
      dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])

I looked up some stuff about nditer (but I want to support earlier versions of numpy.) Also I'm reading one should avoid iteration.

4
  • I was thinking I could reference just the first column and apply a function on that, but I'm getting index error: invalid index when I try this: array[:,0] Commented May 6, 2014 at 14:42
  • With a np.rec.array you would be able to access those columns using array["pos"]. But I don't know how to add anything in the "string addition broadcasting" manner you are looking for. Commented May 6, 2014 at 14:45
  • hmm, I can access the first column with array['pos'] but I'm not sure how to modify the values from there. (assuming that's the right direction) Commented May 6, 2014 at 14:49
  • possible duplicate of Element-wise string concatenation in numpy Commented May 6, 2014 at 14:56

3 Answers 3

6

Using numpy.core.defchararray.add:

>>> from numpy import array
>>> from numpy.core.defchararray import add
>>>
>>> xs = array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
...             ('6601', 2.2452745388799898e-27, 0.99999999995270605),
...             ('21801', 1.9849650921836601e-31, 0.99999999997999001),
...             ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
...             ('45164198', 1.09470356446595e-24, 0.99999999997635303),
...             ('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
...            dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
>>> xs['pos'] = add('2R:', xs['pos'])
>>> xs
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
       ('2R:6601', 2.24527453887999e-27, 0.999999999952706),
       ('2R:21801', 1.98496509218366e-31, 0.99999999997999),
       ('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
       ('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
       ('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
      dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])

UPDATE: You can use num.char.add instead of numpy.core.defchararray.add (commented by @joel-buursma):

>>> import numpy
>>> numpy.char == numpy.core.defchararray
True
Sign up to request clarification or add additional context in comments.

4 Comments

Impressive solution. If we were to append (instead of prepend) '2R:', would this function handle its task significantly differently according to whether 'S100' leaves enough space or not? (Imagine for example that the number has 98 digits.)
@eickenberg, According to an experiment (pastebin.com/dVRyJmQH), add (both prepend/append) truncate trailing parts according to the size specified.
You can now use np.char instead of numpy.core.defchararray. (numpy.org/doc/stable/reference/generated/numpy.char.add.html)
@JoelBuursma, Thank you for the information. I updated the answer according to your comment.
2

A simple (albeit perhaps not optimal) solution is just:

a = np.array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
       ('6601', 2.2452745388799898e-27, 0.99999999995270605),
       ('21801', 1.9849650921836601e-31, 0.99999999997999001),
       ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
       ('45164198', 1.09470356446595e-24, 0.99999999997635303),
       ('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
      dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])


a['pos'] = [''.join(('2R:',x)) for x in a['pos']]

In [11]: a
Out[11]:
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
       ('2R:6601', 2.24527453887999e-27, 0.999999999952706),
       ('2R:21801', 1.98496509218366e-31, 0.99999999997999),
       ('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
       ('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
       ('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
      dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])

While I like @falsetru's answer for using core numpy routines, surprisingly, list comprehension seems a bit faster:

In [19]: a = np.empty(20000, dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])

In [20]: %timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
100 loops, best of 3: 11.1 ms per loop

In [21]: %timeit a['pos'] = add('2R:', a['pos'])
100 loops, best of 3: 15.7 ms per loop

Definitely benchmark your own use case and hardware to see which makes more sense for your actual application though. One of the things I've learned is that in certain situations, basic python constructs can outperform numpy built-ins, depending on the task at hand.

2 Comments

Interesting results in timing it. Thanks!
both the solutions throw errors for me: join : TypeError: sequence item 1: expected str instance, numpy.bytes_ found and add: TypeError: must be str, not numpy.bytes_
1

Another slightly faster solution is to use list comprehension with + operator. Though I do not understand why it is faster. But it is definitely very elegant and basic.

a['pos'] = ["2R:" + x for x in a['pos']]

Timings:

%timeit a['pos'] = ["2R:" + x for x in a['pos']]
8.07 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
9.53 ms ± 391 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit a['pos'] = add('2R:', a['pos'])
14.2 ms ± 337 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

PS: I created the array a using slightly different definition:

a = np.empty(20000, dtype=[('pos', 'U5'), ('par1', '<f8'), ('par2', '<f8')])

as if I use type Sxxx for pos, concatenation produces a type error for me.

1 Comment

Note that you end up with a list in this case, you need to convert a to a numpy array again if needed: a = np.asarray(a).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.