I have a .dat file with no delimiters that I am trying to read into an array. Say each new line represents one person, and variables in each line are defined in terms of a fixed number of characters, e.g the first variable "year" is the first four characters, the second variable "age" is the next 2 characters (no delimiters within the line) e.g.:
201219\n
201220\n
201256\n
Here is what I am doing right now:
data_file = 'filename.dat'
file = open(data_file, 'r')
year = []
age = []
for line in file:
year.append(line[0:4])
age.append(line[4:])
This works fine for a small number of lines and variables, but when I try loading the full data file (500Mb with 10 million lines and 20 variables) I get a MemoryError. Is there a more efficient way to load this type of data into arrays?
numpycan be the efficient way.