I have a data file with multiple rows, and 8 columns - I want to average column 8 of rows that have the same data on columns 1, 2, 5 - for example my file can look like this:
564645 7371810 0 21642 1530 1 2 30.8007
564645 7371810 0 21642 8250 1 2 0.0103
564645 7371810 0 21643 1530 1 2 19.3619
I want to average the last column of the first and third row since columns 1-2-5 are identical;
I want the output to look like this:
564645 7371810 0 21642 1530 1 2 25.0813
564645 7371810 0 21642 8250 1 2 0.0103
my files (text files) are pretty big (~10000 lines) and redundant data (based on the above rule) are not in regular intervals - so I want the code to find the redundant data, and average them...
in response to larsks comment - here are my 4 lines of code...
import os
import numpy as np
datadirectory = input('path to the data directory, ')
os.chdir( datadirectory)
##READ DATA FILE AND CREATE AN ARRAY
dataset = open(input('dataset_to_be_used, ')).readlines()
data = np.loadtxt(dataset)
##Sort the data based on common X, Y and frequency
datasort = np.lexsort((data[:,0],data[:,1],data[:,4]))
datasorted = data[datasort]
np? I'm guessingnumpy, but your code doesn't show an import so it's hard to be sure.