2

I have a .dat file which contains about 1000 lines each line contains

letter int int int int boolean and i am trying to read it in so each line is a new row on my array. currently I have np.genfromtxt('myfile.dat') which gives me

nan 23. 34. 23. 55. 1.

this is almost right but that nan should be the letter 't' any idea how I get to read in the correct letter? And also how do I get rid of the . after each number? cheers

1 Answer 1

2

One way is defining a new dtype. For example:

import numpy as np

desc = np.dtype([('letter', 'S1'), ('v1', float), ('v2', float),
                 ('v3', float), ('v4', float)])

and use in genfromtext:

data = np.genfromtxt(fobj, dtype=desc)

This file content:

x 23. 34. 23. 55. 1.
y 23. 34. 23. 55. 1.

would give you this data:

array([(b'x', 23.0, 34.0, 23.0, 55.0), (b'y', 23.0, 34.0, 23.0, 55.0)], 
      dtype=[('letter', 'S1'), ('v1', '<f8'), ('v2', '<f8'), ('v3', '<f8'), ('v4', '<f8')])

This is an record array. You can access one line:

>>> data[0]
(b'x', 23.0, 34.0, 23.0, 55.0)

or one column:

>>> data['letter']
    array([b'x', b'y'], 
          dtype='|S1')

or one entry:

>>> data[0][1]
23.0
>>> data['v1'][1]
23.0
Sign up to request clarification or add additional context in comments.

7 Comments

This is working, but the file contains 2000 lines and when you get the shape I get (2000l,) this doesn't seem right isn't there might to be another number in there? cheers
shape (2001,) or (20001,) ?
Try dtype=None. Either way the result will be a 1d structured array, one record per line.
Mike it gives (2000L), as in the letter, not sure what that means. dtype=None doesn't work.
2000L is a long integer. Northing to worried about. Looks you are on Windows using Python 2.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.