2

I would like to know how to import file with multiple delimiter.

I have the following line:

"1,000";"2,000";"3,000"

How can I import the data with numpy?

I have the following code

data=numpy.loadtxt(filepath,delimiter=';')

how can I pass a 2nd delimiter for the " "

Kind Regards

5
  • 1
    You'll have to parse the quoted strings after loading. Commented Dec 17, 2018 at 18:28
  • How so? I currently have the following error: could not convert string to float: '(s)' Commented Dec 17, 2018 at 18:41
  • add dtype=str to your loadtxt Commented Dec 17, 2018 at 18:45
  • It imported the data as a str352. How can I convert it to float? Commented Dec 17, 2018 at 19:02
  • I have no idea how setting dtype=str is supposed to help. You probably should not do that. Commented Dec 17, 2018 at 19:03

2 Answers 2

1

pandas.read_csv can read such a file. It allows you to control the delimiter and the decimal point character.

Here's my file delim.dat:

"1,000";"2,000";"3,000"
"5,000";"6,000";"7,000"
"8,000";"9,000";"9,100"
"9,250";"9,500";"9,990"

Use the arguments delimiter=';' and decimal=',' in pandas.read_csv:

In [11]: import pandas as pd

In [12]: df = pd.read_csv('delim.dat', sep=';', decimal=',', header=None)

In [13]: df
Out[13]: 
      0    1     2
0  1.00  2.0  3.00
1  5.00  6.0  7.00
2  8.00  9.0  9.10
3  9.25  9.5  9.99

You can also use numpy.genfromtxt, but you'll have to use the converters argument to convert each field from bytes to floating point. For example,

In [54]: def myconvert(s):
    ...:     return float(s.strip(b'"').replace(b',', b'.'))
    ...: 
    ...: 

In [55]: a = np.genfromtxt('delim.dat', delimiter=';', converters={k: myconvert for k in range(3)})

In [56]: a
Out[56]: 
array([[1.  , 2.  , 3.  ],
       [5.  , 6.  , 7.  ],
       [8.  , 9.  , 9.1 ],
       [9.25, 9.5 , 9.99]])
Sign up to request clarification or add additional context in comments.

Comments

0

You have some weird delimiters, not to mention the use of commas in the middle of the numeric literals. Numpy isn't set up to handle any of these things by default, so you'll have to parse your input by hand a little bit before you pass it off to Numpy. You can do so using some regular expressions:

import re
from io import StringIO

# fake file-like object for demonstration
f = StringIO('''"1,000";"2,000";"3,000"''')

s = re.sub('";?"?', ' ', f.read())
s = re.sub(',', '.', s)

arr = np.fromstring(s, sep=' ')
print(arr)

Output:

[1. 2. 3.]

3 Comments

The comma ',' is a language "barrier". In my contry ',' is used as decimal.
Gotcha. I changed the answer to match.
Thanks for the comment Chris. However I would like to do this in numpy/scypy/pandas domain.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.