I am new to programming and have taken up learning python in an attempt to make some tasks I run in my research more efficient. I am running a PCA in the pandas module (I found a tutorial online) and have the script for this, but need to subselect part of a dataframe prior to the pca.
so far I have (just for example in reality I am reading a .csv file with a larger matrix)
x = np.random.randint(30, size=(8,8))
df = pd.DataFrame(x)
0 1 2 3 4 5 6 7
0 9 0 23 13 2 5 14 6
1 20 17 11 10 25 23 20 23
2 15 14 22 25 11 15 5 15
3 9 27 15 27 7 15 17 23
4 12 6 11 13 27 11 26 20
5 27 13 5 16 5 5 2 18
6 3 18 22 0 7 10 11 11
7 25 18 10 11 29 29 1 25
What I want to do is sub-select columns that satisfy a certain criteria in any of the rows, specifically I want every column that has at least one number =>27 (just for example) to produce a new dataframe
0 1 3 4 5
0 9 0 13 2 5
1 20 17 10 25 23
2 15 14 25 11 15
3 9 27 27 7 15
4 12 6 13 27 11
5 27 13 16 5 5
6 3 18 0 7 10
7 25 18 11 29 29
I have looked into the various slicing methods in pandas but none seem to do what I want (.loc and .iloc etc.).
The actual script I am using to read in thus far is
filename = 'Data.csv'
data = pd.read_csv(filename,sep = ',')
x = data.ix[:,1:] # variables - species
y = data.ix[:,0] # cases - age
so a sub dataframme of x is what I am after (as above).
Any advice is greatly appreciated.