5

I have a .dat file that looks like this.

ID_1,5.0,5.0,5.0,... 
ID_2,5.0,5.0,5.0,...

I'm trying to import the data into Python as an array.

If I do this, it'll give me a list of tuples.

data = np.genfromtxt('mydat.dat',
                     dtype=None,
                     delimiter=',')

However, when I do the following it gives an odd result, probably because that first element is not a float.

np.fromfile('mydat.dat', dtype=float)

array([  3.45301146e-086,   3.45300781e-086,   3.25195588e-086, ...,
         8.04331780e-096,   8.04331780e-096,   1.31544776e-259])

Any suggestions on this? These were the two main ways to import .dat files into Python as an array and they don't seem to provide the desired result.

2
  • are the lines always in that form? like id, then some values, and a newline separating the lines of data, do you want a 2d array, and would lists work instead of array? Commented Sep 28, 2017 at 15:45
  • There is not new list. I fixed it. need 2d array. The end goal is to use the data in Keras, so I do need it as an array Commented Sep 28, 2017 at 15:52

2 Answers 2

5

Here is one way where we read each line of 'mydat.dat' file , convert each value to str or float and then load to numpy array.

import numpy as np

def is_float(string):
    """ True if given string is float else False"""
    try:
        return float(string)
    except ValueError:
        return False

data = []
with open('mydat.dat', 'r') as f:
    d = f.readlines()
    for i in d:
        k = i.rstrip().split(",")
        data.append([float(i) if is_float(i) else i for i in k]) 

data = np.array(data, dtype='O')

Result

>>> data
array([['ID_1', 5.0, 5.0, 5.0],
       ['ID_2', 5.0, 5.0, 5.0]], dtype=object)

Also, if you can use pandas to read and manipulate data , I would do so. pandas works with much efficiency especially for larger data and is easy to manipulate.

#read data as csv to a dataframe
>>> df = pd.read_csv('mydat.dat', sep=",", header=None)
>>> df
      0    1    2    3
0  ID_1  5.0  5.0  5.0
1  ID_2  5.0  5.0  5.0

#Transposed data with ID numbers as headers
>>> df.T
      0     1
0  ID_1  ID_2
1     5     5
2     5     5
3     5     5
>>> 
Sign up to request clarification or add additional context in comments.

Comments

3

You might want to use numpy loadtext. You can specify formats of different columns.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.