Importing with numpy array with multiple delimiters

Question

I would like to know how to import file with multiple delimiter.

I have the following line:

"1,000";"2,000";"3,000"

How can I import the data with numpy?

I have the following code

data=numpy.loadtxt(filepath,delimiter=';')

how can I pass a 2nd delimiter for the " "

Kind Regards

How so? I currently have the following error: could not convert string to float: '(s)' — user8458838
– user8458838, Commented Dec 17, 2018 at 18:41
It imported the data as a str352. How can I convert it to float? — user8458838
– user8458838, Commented Dec 17, 2018 at 19:02
I have no idea how setting dtype=str is supposed to help. You probably should not do that. — tel
– tel, Commented Dec 17, 2018 at 19:03

Warren Weckesser · Accepted Answer · 2018-12-17 20:53:49Z

pandas.read_csv can read such a file. It allows you to control the delimiter and the decimal point character.

Here's my file delim.dat:

"1,000";"2,000";"3,000"
"5,000";"6,000";"7,000"
"8,000";"9,000";"9,100"
"9,250";"9,500";"9,990"

Use the arguments delimiter=';' and decimal=',' in pandas.read_csv:

In [11]: import pandas as pd

In [12]: df = pd.read_csv('delim.dat', sep=';', decimal=',', header=None)

In [13]: df
Out[13]: 
      0    1     2
0  1.00  2.0  3.00
1  5.00  6.0  7.00
2  8.00  9.0  9.10
3  9.25  9.5  9.99

You can also use numpy.genfromtxt, but you'll have to use the converters argument to convert each field from bytes to floating point. For example,

In [54]: def myconvert(s):
    ...:     return float(s.strip(b'"').replace(b',', b'.'))
    ...: 
    ...: 

In [55]: a = np.genfromtxt('delim.dat', delimiter=';', converters={k: myconvert for k in range(3)})

In [56]: a
Out[56]: 
array([[1.  , 2.  , 3.  ],
       [5.  , 6.  , 7.  ],
       [8.  , 9.  , 9.1 ],
       [9.25, 9.5 , 9.99]])

tel · Accepted Answer · 2018-12-17 19:02:01Z

0

You have some weird delimiters, not to mention the use of commas in the middle of the numeric literals. Numpy isn't set up to handle any of these things by default, so you'll have to parse your input by hand a little bit before you pass it off to Numpy. You can do so using some regular expressions:

import re
from io import StringIO

# fake file-like object for demonstration
f = StringIO('''"1,000";"2,000";"3,000"''')

s = re.sub('";?"?', ' ', f.read())
s = re.sub(',', '.', s)

arr = np.fromstring(s, sep=' ')
print(arr)

Output:

[1. 2. 3.]

edited Dec 17, 2018 at 19:02

answered Dec 17, 2018 at 18:51

tel

14k2 gold badges48 silver badges67 bronze badges

3 Comments

user8458838 Over a year ago

The comma ',' is a language "barrier". In my contry ',' is used as decimal.

tel Over a year ago

Gotcha. I changed the answer to match.

user8458838 Over a year ago

Thanks for the comment Chris. However I would like to do this in numpy/scypy/pandas domain.

Collectives™ on Stack Overflow

Importing with numpy array with multiple delimiters

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related