1

Again, I am very new to python. Below I provide my code (for classification with feature selection), not the data since it is rather high dimensional, but I believe that the problem is quite data-independent. My question is two-fold: I want axis labels for all subplots, and I would like to know how I can subplot where the number of subplots can be different per row (I have 14 subplots, currently in three rows):

import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.feature_selection import RFECV
from sklearn.datasets import make_classification
from sklearn import preprocessing
import scipy.io as sio
import numpy as np
import os

allData = sio.loadmat('Alldatav2.mat')
allFeatures = allData['featuresAll2']

# loop over subjects
n_subject = [0,1,2,3,4,5,6,7,8,9,10,11,12,13]

fig, axs = plt.subplots(3,5,figsize=(15, 6))
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
fig.subplots_adjust()
axs = axs.ravel()

for i, j in zip(n_subject, range(15)):
    #print("For Subject : ", i+1)
    y = allData['labels']
    X = allFeatures[i*120:(i+1)*120,:]

    svc = SVC(kernel="linear",C=1)
    rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(2),
              scoring='accuracy')
    rfecv.fit(X, y.ravel())


    axs[j].plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
plt.show()


# loop over subjects
def mean(numbers):
    return float(sum(numbers)) / max(len(numbers), 1)

n_subject = [0,1,2,3,4,5,6,7,8,9,10,11,12,13]
avg_scores = []

for i in n_subject:
    print("For Subject : ", i+1)
    y = allData['labels']
    X = allFeatures[i*120:(i+1)*120,:]

    svc = SVC(kernel="linear",C=1)
    rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(10),
              scoring='accuracy')
    rfecv.fit(X, y.ravel())
    print("Optimal number of features : %d" % rfecv.n_features_)
    print("Ranking of Features : ", rfecv.ranking_)
    avg_score = rfecv.grid_scores_.max()
    print("Best CV Score : ", avg_score)
    avg_scores.append(avg_score)
    print("------------------------------------------")
print("Average Accuracy over all Subjects : ", mean(avg_scores))

1 Answer 1

1

For the labels for each subplots, you can first create a list that contains those labels.

xlabelList = [xlabel0, xlabel1 ....,xlabel13]
ylabelList = [ylabel0, ylabel1,....,ylabel13]

Also you don't need to define extra variable n_subject for looping. For plotting I will make the following changes:

for j in range(14):

    #print("For Subject : ", j+1)
    y = allData['labels']
    X = allFeatures[j*120:(j+1)*120,:]

    svc = SVC(kernel="linear",C=1)
    rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(2),
          scoring='accuracy')
    rfecv.fit(X, y.ravel())

    locInd = np.unravel_index(j, (3,5))    
    axs[locInd].plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
    axs[locInd].set_xlabel(xlabelList[j])
    axs[locInd].set_ylabel(ylabelList[j])
plt.show()
Sign up to request clarification or add additional context in comments.

2 Comments

many thanks! the x and y labels are all the same, how can I repeat this text 14 times without typing it in manually? Also, I tried (just for testing) the axis labels as: [a,a,a,a,a,a,a,a,a,a,a,a,a,a] but this seems invalid syntax - why?
Your labels have to be a string. If all the labels are same you can just do: axs[locInd].set_xlabel(xlabel), no need for a list.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.