-1

I have this text file: www2.geog.ucl.ac.uk/~plewis/geogg122/python/delnorte.dat

I want to extract column 3 and 4.

I am using np.loadtxt - getting the error:

ValueError: invalid literal for float(): 2000-01-01

I am only interested in the year 2005. How can I extracted both columns?

1
  • numpy.loadtxt('delnorte.dat', usecols=[2, 3], dtype=object) Commented Dec 31, 2012 at 15:16

3 Answers 3

1

You can provide a custom conversion function for a specific column to loadtxt.
Since you are only interested in the year I use a lambda-function to split the date on - and to convert the first part to an int:

data = np.loadtxt('delnorte.dat',
         usecols=(2,3),
         converters={2: lambda s: int(s.split('-')[0])},
         skiprows=27)

array([[ 2000.,   190.],
       [ 2000.,   170.],
       [ 2000.,   160.],
       ..., 
       [ 2010.,   185.],
       [ 2010.,   175.],
       [ 2010.,   165.]])

To filter then for the year 2005 you can use logical indexing in numpy:

data_2005 = data[data[:,0] == 2005]

array([[ 2005.,   210.],
       [ 2005.,   190.],
       [ 2005.,   190.],
       [ 2005.,   200.],
        ....])
Sign up to request clarification or add additional context in comments.

1 Comment

This is great- How should I extract the year (i.e. 2005) and its corresponding value from the array I just created? Thanks.
0

You should not use NumPy.loadtxt to read these values, you should rather use the csv module to load the file and read its data.

Comments

0

I agree with using the csv module. I adapted this answer: reading csv files in scipy/numpy in Python to apply to your question. Not sure if you desire the data in a numpy array or if a list is sufficient.

import numpy as np
import urllib2
import csv

txtFile = csv.reader(open("delnorte.dat.txt", "r"), delimiter='\t')

fields = 5                   
records = [] 
for row, record in enumerate(txtFile):
    if (len(record) != fields or record[0]=='#'):
        pass
        # print "Skipping malformed record or comment: {}, contains {} fields ({} expected)".format(record,len(record),fields)
    else:
        if record[2][0:4] == '2005': 
            # assuming you want columns 3 & 4 with the first column indexed as 0
            records.append([int(record[:][3]), record[:][4]] ) 

# if desired slice the list of lists to put a single column into a numpy array
npData = np.asarray([ npD[0] for npD in records] ) 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.