6

I'm trying to import data with python/numpy.loadtxt. With most of the data this isn't a problem, e.g. if a row looks like this:

0.000000      0.000000      0.000000      0.000000    -0.1725804E-13

In this case I can use white space as the delimiter. Unfortunately the program which produces the data doesn't use delimiters, just a fixed column width (and I can't change that). Example:

-0.1240503E-03-0.6231297E-04  0.000000      0.000000    -0.1126164E-02

Can I tell numpy.loadtxt in some easy way that every column is 14 characters? I'd prefer to not have to modify the files the other program produces manually...

EDIT:

I thought I share my very simple solution based on dxwx's suggestion. For the example I provided the solution would be

a = numpy.genfromtxt('/path/to/file.txt', delimiter = 14)

There was a additional whitespace before the first column in my real data, and I didn't want to use the last column and the last row. So it looks like this now:

a = numpy.genfromtxt('/path/to/file.txt',
                     delimiter = (1,14,14,14,14,14,14), 
                     usecols = range(1,6), skip_footer = 1)

Thanks everyone for the fast response.

2 Answers 2

5

Have a look at Numpy's genfromtxt - that says it can use an integer width for the separator.

Sign up to request clarification or add additional context in comments.

Comments

1

I would use numpy.fromregex instead. You can then just define a basic regular expression to capture up to 14 characters.

So, here we capture each field with the RE group [-.\dE]{1,14} (which assumes that there are no missing values and that the format always matches the example you gave, regarding which characters are possible):

>>> regex = r"([-.\dE]{1,14})\s*([-.\dE]{1,14})\s*([-.\dE]{1,14})\s*([-.\dE]{1,14})\s*([-.\dE]{1,14})"
>>> np.fromregex(dat, regex, [('A', np.float32), ('B', np.float32),('C', np.float32),('D', np.float32),('E', np.float32),])
array([ (-0.0001240503042936325, -6.231296720216051e-05, 0.0, 0.0, -0.0011261639883741736)
], 
      dtype=[('A', '<f4'), ('B', '<f4'), ('C', '<f4'), ('D', '<f4'), ('E', '<f4')])

1 Comment

Thank you for your answer, it has taught me a more general approach for importing data :). dwxw's answer is simpler though and works as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.