How to load specific columns with varying location from a text file in python?

Question

I'm trying to read the discharge data of 346 US rivers stored online in textfiles. The files are more or less in this format:

Measurement_number    Date          Gage_height     Discharge_value     
1                     2017-01-01    10              1000
2                     2017-01-20    15              2000
# etc.

I only want to read the gage height and discharge value columns. The problem is that in most files additional columns with metadata are added in front of the 'Gage height' column, so i can not just simply read the 3rd and 4th column because their index varies.

I'm trying to find a way to say 'read the columns with the name 'Gage_height' and 'Discharge_value'', but I haven't succeeded yet.

I hope anyone can help. I'm currently trying to load the text files with numpy.genfromtxt so it would be great to find a solution with that package but other solutions are also more than welcome.

This is my code so far

data_url=urllib2.urlopen(#the url of this specific site)
data=np.genfromtxt(data_url,skip_header=1,comments='#',usecols=2,3])

Your header line seems to be quite separable. Just read the first line (fid = open(path, "r"); first = fid.readline().split(" "); col1 = first.index("Gage_height"); col2 = first.index("Discharge_value")) and get the indexes for the columns you want from there. — armatita
– armatita, Commented May 24, 2017 at 15:32

tmdavison · Accepted Answer · 2017-05-24 15:35:29Z

1

You can use the names=True option to genfromtxt, and then use the column names to select which columns you want to read with usecols.

For example, to read 'Gage_height' and 'Discharge_value' from your data file:

data = np.genfromtxt(filename, names=True, usecols=['Gage_height', 'Discharge_value'])

Note that you don't need to set skip_header=1 if you use names=True.

You can then access the columns using their names:

gage_height = data['Gage_height']               #  == array([ 10.,  15.])
discharge_value = data['Discharge_value']       #  == array([ 1000.,  2000.])

See the docs here for more information.

answered May 24, 2017 at 15:35

tmdavison

69.8k13 gold badges204 silver badges182 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to load specific columns with varying location from a text file in python?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related