I'm trying to read the discharge data of 346 US rivers stored online in textfiles. The files are more or less in this format:
Measurement_number Date Gage_height Discharge_value
1 2017-01-01 10 1000
2 2017-01-20 15 2000
# etc.
I only want to read the gage height and discharge value columns. The problem is that in most files additional columns with metadata are added in front of the 'Gage height' column, so i can not just simply read the 3rd and 4th column because their index varies.
I'm trying to find a way to say 'read the columns with the name 'Gage_height' and 'Discharge_value'', but I haven't succeeded yet.
I hope anyone can help. I'm currently trying to load the text files with numpy.genfromtxt so it would be great to find a solution with that package but other solutions are also more than welcome.
This is my code so far
data_url=urllib2.urlopen(#the url of this specific site)
data=np.genfromtxt(data_url,skip_header=1,comments='#',usecols=2,3])
fid = open(path, "r"); first = fid.readline().split(" "); col1 = first.index("Gage_height"); col2 = first.index("Discharge_value")) and get the indexes for the columns you want from there.