I have done two algorithms and I want to check which one of them is more 'efficient' and uses less memory. The first one creates a numpy array and modifies the array. The second one creates a python empty array and pushes values into this array. Who's better? First program:
f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
lines = f.readlines()
f.close()
zeros = np.zeros((60343,4917))
for l in lines:
row = l.split(",")
for element in row:
zeros[lines.index(l), row.index(element)] = element
X = zeros[1,:]
Y = zeros[:,0]
one_hot = np.ones((counter, 2))
The second one:
f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
lines = f.readlines()
f.close()
X = []
Y = []
for l in lines:
row = l.split(",")
X.append([float(elem) for elem in row[1:]])
Y.append(float(row[0]))
X = np.array(X)
Y = np.array(Y)
one_hot = np.ones((counter, 2))
My theory is that the first one is slower but uses less memory and it's more 'stable' while working with large files. The second one it's faster but uses a lot of memory and its not so stable while working with large files (543MB, 70,000 lines)
Thanks!
file.readlines(), which loads all the lines of the file in memory. You should iterate over the file object directly.