Import .dat file as an array

Question

I have a .dat file that looks like this.

ID_1,5.0,5.0,5.0,... 
ID_2,5.0,5.0,5.0,...

I'm trying to import the data into Python as an array.

If I do this, it'll give me a list of tuples.

data = np.genfromtxt('mydat.dat',
                     dtype=None,
                     delimiter=',')

However, when I do the following it gives an odd result, probably because that first element is not a float.

np.fromfile('mydat.dat', dtype=float)

array([  3.45301146e-086,   3.45300781e-086,   3.25195588e-086, ...,
         8.04331780e-096,   8.04331780e-096,   1.31544776e-259])

Any suggestions on this? These were the two main ways to import .dat files into Python as an array and they don't seem to provide the desired result.

are the lines always in that form? like id, then some values, and a newline separating the lines of data, do you want a 2d array, and would lists work instead of array? — Jacobr365
– Jacobr365, Commented Sep 28, 2017 at 15:45
There is not new list. I fixed it. need 2d array. The end goal is to use the data in Keras, so I do need it as an array — ATMA
– ATMA, Commented Sep 28, 2017 at 15:52

Anil_M · Accepted Answer · 2017-09-28 18:22:07Z

Here is one way where we read each line of 'mydat.dat' file , convert each value to str or float and then load to numpy array.

import numpy as np

def is_float(string):
    """ True if given string is float else False"""
    try:
        return float(string)
    except ValueError:
        return False

data = []
with open('mydat.dat', 'r') as f:
    d = f.readlines()
    for i in d:
        k = i.rstrip().split(",")
        data.append([float(i) if is_float(i) else i for i in k]) 

data = np.array(data, dtype='O')

Result

>>> data
array([['ID_1', 5.0, 5.0, 5.0],
       ['ID_2', 5.0, 5.0, 5.0]], dtype=object)

Also, if you can use pandas to read and manipulate data , I would do so. pandas works with much efficiency especially for larger data and is easy to manipulate.

#read data as csv to a dataframe
>>> df = pd.read_csv('mydat.dat', sep=",", header=None)
>>> df
      0    1    2    3
0  ID_1  5.0  5.0  5.0
1  ID_2  5.0  5.0  5.0

#Transposed data with ID numbers as headers
>>> df.T
      0     1
0  ID_1  ID_2
1     5     5
2     5     5
3     5     5
>>>

ShreyasG · Accepted Answer · 2017-09-28 15:47:38Z

3

You might want to use numpy loadtext. You can specify formats of different columns.

answered Sep 28, 2017 at 15:47

ShreyasG

8065 silver badges11 bronze badges

Collectives™ on Stack Overflow

Import .dat file as an array

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related