I am attempting to read in many files. Each file is a daily data file with data every 10 minutes. the data in each file is kind of "chunked up" like this:
2015-11-08 00:10:00 00:10:00
# z speed dir W sigW bck error
30 3.32 111.9 0.15 0.12 1.50E+05 0
40 3.85 108.2 0.07 0.14 7.75E+04 0
50 4.20 107.9 0.06 0.15 4.73E+04 0
60 4.16 108.5 0.03 0.19 2.73E+04 0
70 4.06 93.6 0.03 0.23 9.07E+04 0
80 4.06 93.8 0.07 0.28 1.36E+05 0
2015-11-08 00:20:00 00:10:00
# z speed dir W sigW bck error
30 3.79 120.9 0.15 0.11 7.79E+05 0
40 4.36 115.6 0.04 0.13 2.42E+05 0
50 4.71 113.6 0.07 0.14 6.84E+04 0
60 5.00 113.3 0.13 0.17 1.16E+04 0
70 4.29 94.2 0.22 0.20 1.38E+05 0
80 4.54 94.1 0.11 0.25 1.76E+05 0
2015-11-08 00:30:00 00:10:00
# z speed dir W sigW bck error
30 3.86 113.6 0.13 0.10 2.68E+05 0
40 4.34 116.1 0.09 0.11 1.41E+05 0
50 5.02 112.8 0.04 0.12 7.28E+04 0
60 5.36 110.5 0.01 0.14 5.81E+04 0
70 4.67 95.4 0.14 0.16 7.69E+04 0
80 4.56 95.0 0.15 0.21 9.84E+04 0
...
The file continues on like this every 10 minutes for the whole day. The file name for this file is 151108.mnd. I want my code to read in all files that are for november so 1511??.mnd and I want my code to read in each day file for a whole month grab all of the datetime lines so for the partial data file example I just showed I would want my code to grab 2015-11-08 00:10:00, 2015-11-08 00:20:00, 2015-11-08 00:30:00, etc. store as variables and then go to the next day file (151109.mnd) and grab all the datetime lines and store as date variable and append on to the previously stored dates. And so on and so forth for the whole month. Here is the code I have so far:
import pandas as pd
import glob
import datetime
filename = glob.glob('1511??.mnd')
data_nov15_hereford = pd.DataFrame()
frames = []
dates = []
counter = 1
for i in filename:
f_nov15_hereford = pd.read_csv(i, skiprows = 32)
for line in f_nov15_hereford:
if line.startswith("20"):
print line
date_object = datetime.datetime.strptime(line[:-6], '%Y-%m-%d %H:%M:%S %f')
dates.append(date_object)
counter = 0
else:
counter += 1
frames.append(f_nov15_hereford)
data_nov15_hereford = pd.concat(frames,ignore_index=True)
data_nov15_hereford = data_nov15_hereford.convert_objects(convert_numeric=True)
print dates
This code has some problems because when I print dates it prints out two copies of every date and it also only prints out the first date of every file so 2015-11-08 00:10:00, 2015-11-09 00:10:00, etc. It isn't going line-by-line in every file then once all dates in that file are stored moving on to the next file like I want. Instead it is just grabbing the first date in each file. Any help on this code? Is there an easier way to do what I want? Thanks!