0

i try to analysis the Principle Component from cvs file but when i run the code i get this error

C:\Users\Lenovo\Desktop>python pca.py

ValueError: could not convert string to float: Annee;NET;INT;SUB;LMT;DCT;IMM;EXP;VRD

this is my cvs file

<code>enter image description here</code>

i try to remove any space and any think this is my python script, i don't know what i miss

Note: i run this code under python2.7

from sklearn.externals import joblib  
import numpy as np  
import glob  
import os  
import time  
import numpy

my_matrix = numpy.loadtxt(open("pca.csv","rb"),delimiter= ",",skiprows=0)  
def pca(dataMat, r, autoset_r=False, autoset_rate=0.9): 
    """
    purpose: principal components analysis
    """  
    print("Start to do PCA...") 
    t1 = time.time() 
    meanVal = np.mean(dataMat, axis=0)  
    meanRemoved = dataMat - meanVal  
    # normData = meanRemoved / np.std(dataMat)  
    covMat = np.cov(meanRemoved, rowvar=0)    
    eigVals, eigVects = np.linalg.eig(np.mat(covMat)) 
    eigValIndex = np.argsort(-eigVals)  


    if autoset_r:
        r = autoset_eigNum(eigVals, autoset_rate)
        print("autoset: take top {} of {} features".format(r, meanRemoved.shape[1]))

    r_eigValIndex = eigValIndex[:r]  
    r_eigVect = eigVects[:, r_eigValIndex]  
    lowDDataMat = meanRemoved * r_eigVect  
    reconMat = (lowDDataMat * r_eigVect.T) + meanVal    
    t2 = time.time()   
    print("PCA takes %f seconds" %(t2-t1))
    joblib.dump(r_eigVect, './pca_args_save/r_eigVect.eig')    
    joblib.dump(meanVal, './pca_args_save/meanVal.mean')   
    return lowDDataMat, reconMat


def autoset_eigNum(eigValues, rate=0.99):

    eigValues_sorted = sorted(eigValues, reverse=True)
    eigVals_total = eigValues.sum()
    for i in range(1, len(eigValues_sorted)+1):
        eigVals_sum = sum(eigValues_sorted[:i])     
        if eigVals_sum / eigVals_total >= rate:
            break
    return i
1
  • If your df is <20 rows long can you just go through the entire thing and check each entry with isdigit? From there you can find the problem entries and troubleshoot further from there. Quick referece for isdigit Commented Dec 28, 2019 at 5:41

1 Answer 1

1

It seemed that NumPy has some problem parsing your index row to float.

Try setting skiprows = 1 in your np.readtxt command in order to skip the table header.

Sign up to request clarification or add additional context in comments.

2 Comments

i try it but i get this error ValueError: invalid literal for float(): 1969;17.93;3.96;0.88;7.38;19.86;25.45;5.34;19.21
Please also set your delimiter according to your file. In your case, delimiter= ";" should do the job.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.