Extracting Columns from .dat using np.loadtxt Python

Question

I have this text file: www2.geog.ucl.ac.uk/~plewis/geogg122/python/delnorte.dat

I want to extract column 3 and 4.

I am using np.loadtxt - getting the error:

ValueError: invalid literal for float(): 2000-01-01

I am only interested in the year 2005. How can I extracted both columns?

numpy.loadtxt('delnorte.dat', usecols=[2, 3], dtype=object) — mmgp
– mmgp, Commented Dec 31, 2012 at 15:16

tzelleke · Accepted Answer · 2012-12-31 16:27:25Z

1

You can provide a custom conversion function for a specific column to loadtxt.
Since you are only interested in the year I use a lambda-function to split the date on - and to convert the first part to an int:

data = np.loadtxt('delnorte.dat',
         usecols=(2,3),
         converters={2: lambda s: int(s.split('-')[0])},
         skiprows=27)

array([[ 2000.,   190.],
       [ 2000.,   170.],
       [ 2000.,   160.],
       ..., 
       [ 2010.,   185.],
       [ 2010.,   175.],
       [ 2010.,   165.]])

To filter then for the year 2005 you can use logical indexing in numpy:

data_2005 = data[data[:,0] == 2005]

array([[ 2005.,   210.],
       [ 2005.,   190.],
       [ 2005.,   190.],
       [ 2005.,   200.],
        ....])

edited Dec 31, 2012 at 16:27

answered Dec 31, 2012 at 15:28

tzelleke

15.4k5 gold badges35 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

David Colpey Over a year ago

This is great- How should I extract the year (i.e. 2005) and its corresponding value from the array I just created? Thanks.

Jean Hominal · Accepted Answer · 2012-12-31 13:02:51Z

0

You should not use NumPy.loadtxt to read these values, you should rather use the csv module to load the file and read its data.

answered Dec 31, 2012 at 13:02

Jean Hominal

16.9k5 gold badges60 silver badges93 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:18:46Z

I agree with using the csv module. I adapted this answer: reading csv files in scipy/numpy in Python to apply to your question. Not sure if you desire the data in a numpy array or if a list is sufficient.

import numpy as np
import urllib2
import csv

txtFile = csv.reader(open("delnorte.dat.txt", "r"), delimiter='\t')

fields = 5                   
records = [] 
for row, record in enumerate(txtFile):
    if (len(record) != fields or record[0]=='#'):
        pass
        # print "Skipping malformed record or comment: {}, contains {} fields ({} expected)".format(record,len(record),fields)
    else:
        if record[2][0:4] == '2005': 
            # assuming you want columns 3 & 4 with the first column indexed as 0
            records.append([int(record[:][3]), record[:][4]] ) 

# if desired slice the list of lists to put a single column into a numpy array
npData = np.asarray([ npD[0] for npD in records] )

Collectives™ on Stack Overflow

Extracting Columns from .dat using np.loadtxt Python

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related