2

How do I convert the foll. numpy from object dtype to float:

array(['4,364,541', '2,330,200', '2,107,648', '1,525,711', '1,485,231',
       '1,257,500', '1,098,200', '1,065,106', '962,100', '920,200',
       '124,204', '122,320', '119,742', '116,627', '115,900', '108,400',
       '108,400', '108,000', '103,795', '102,900', '101,845', '100,900',
       '100,626'], dtype=object)

I tried arr.astype(float) but that does not work because of , in each string.

4 Answers 4

2

Yet another way

np.frompyfunc(lambda x: x.replace(',',''),1,1)(arr).astype(float)

frompyfunc returns an object dtype array, which is fine in this case. Often I've found that it is 2x faster than than a list comprehension, but here it times about the same as @coldspeed's:

np.array([v.replace(',', '') for v in arr], dtype=np.float32)

That may be because we are starting with an object dtype array. Direct iteration on an object dtype is a bit slower than iteration on a list, but faster than iteration on a regular numpy array. Like a list, the elements of the array are pointers to strings, and don't require the 'unboxing' that a string dtype array would.

(and 2 to 3 x faster than the np.char version).

Sign up to request clarification or add additional context in comments.

Comments

2

Simple way to do it is remove every comma:

np.array([v.replace(',', '') for v in arr], dtype=np.float32)

If you have pandas, to_numeric is a good option. It gracefully handles any invalid values that may creep in post replacement.

pd.to_numeric([v.replace(',', '') for v in arr], errors='coerce',  downcast='float')

Both methods return a float array as output.

Comments

1

Given:

>>> ar
array(['4,364,541', '2,330,200', '2,107,648', '1,525,711', '1,485,231',
       '1,257,500', '1,098,200', '1,065,106', '962,100', '920,200',
       '124,204', '122,320', '119,742', '116,627', '115,900', '108,400',
       '108,400', '108,000', '103,795', '102,900', '101,845', '100,900',
       '100,626'], dtype=object)

You can use filter to remove all non-digit elements and create floats:

>>> np.array(list(map(float, (''.join(filter(lambda c: c.isdigit(), s)) for s in ar))))
array([4364541., 2330200., 2107648., 1525711., 1485231., 1257500.,
       1098200., 1065106.,  962100.,  920200.,  124204.,  122320.,
        119742.,  116627.,  115900.,  108400.,  108400.,  108000.,
        103795.,  102900.,  101845.,  100900.,  100626.])

Comments

1

Can also use numpy.core.defchararray.replace()

>>> numpy.core.defchararray.replace(arr, ',','').astype(np.float)

array([4364541., 2330200., 2107648., 1525711., 1485231., 1257500.,
       1098200., 1065106.,  962100.,  920200.,  124204.,  122320.,
        119742.,  116627.,  115900.,  108400.,  108400.,  108000.,
        103795.,  102900.,  101845.,  100900.,  100626.])

Or np.char.replace as noted in comments by Cold. Naturally, this package provides is built for arrays of type numpy.string_ or numpy.unicode_

If object type,

replace(a.astype(np.unicode_), ',','').astype(np.float)

2 Comments

A shorter alias: np.char.replace will also do the same thing.
That won't work if arr is object dtype. First have to convert it to a string dtype. The char functions essentially iterate on the elements of a string dtype and apply the corresponding string method. My guess is the speed will be similar to iterating on a object dtype array.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.