0

I have one simple data set with class label and stored as "mydata.csv",

GA_ID   PN_ID   PC_ID   MBP_ID  GR_ID   AP_ID   class
0.033   6.652   6.681   0.194   0.874   3.177     0
0.034   9.039   6.224   0.194   1.137   3.177     0
0.035   10.936  10.304  1.015   0.911   4.9       1
0.022   10.11   9.603   1.374   0.848   4.566     1

i simply use given code to convert this data into numpy array so that i can use this data set for predictions and machine learning modeling but due to header is error has been raised "ValueError: could not convert string to float: " when i removed header from the file this method work well for me :

import numpy as np
#from sklearn import metrics
#from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

raw_data = open("/home/me/Desktop/scklearn/data.csv")
dataset = np.loadtxt(raw_data, delimiter=",")
X = dataset[:,0:5]
y = dataset[:,6]

i also tried to skip header but error occurs:

dataset = np.loadtxt(raw_data, delimiter=",")[1:]

then i moved to pandas and able import data from this method:

raw_data = pandas.read_csv("/home/me/Desktop/scklearn/data.csv")

but here I sucked again when i tried to convert this into numpy array its showing error like previous.

is there any method available in pandas that can : save heathers as list :

header_list = ('GA_ID','PN_ID','PC_ID' ,'MBP_ID' ,'GR_ID' , 'AP_ID','class')

last column as class label and remaining part(1:4,0:5) to numpy array for model building:

I have write down a code to get column list

clm_list = []
raw_data = pandas.read_csv("/home/me/Desktop/scklearn/data.csv")
clms = raw_data.columns()
for clm in clms:
    clm_list.append(clm)
print clm_list ## produces column list
4
  • Unclear what your real problem here is, pandas dataframes are compatible with sklearn interfaces, also if you don't want to write the header to a csv from pandas than you can pass param header=None in to_csv Commented Apr 7, 2015 at 10:51
  • @EdChum yes this is true actually my problem is that 1) if suppose i pass param as header=None and after modeling or at the time of feature selection i want to know the header how would i know the headers as i overlooked the header at the time of file opening. and 2) how can i use the given example data directly with pandas to scikit-learn data frame in the form of X = (data without header and class label) and y = (class label for predictions ) Commented Apr 7, 2015 at 10:55
  • Well you can do all this pandas fine, like I said the sklearn interfaces are compatible with pandas dfs Commented Apr 7, 2015 at 11:00
  • @EdChum Hi thanks for reply i have solve my problem and write down a code which i have posted as a answer. This code is doing well for me. thanks Commented Apr 7, 2015 at 11:47

1 Answer 1

3

after reading a lot finally I achieved what I want and successfully implemented data on scikit-learn, code to convert CSV data with scikit-learn compatible form is given bellow. thanks

import pandas as pd
r = pd.read_csv("/home/zebrafish/Desktop/ex.csv")
print r.values

clm_list = []
for column in r.columns:
    clm_list.append(column)


X = r[clm_list[0:len(clm_list)-1]].values
y = r[clm_list[len(clm_list)-1]].values

print clm_list
print X
print y

out come of this code is exactly what i want :

['GA_ID', 'PN_ID', 'PC_ID', 'MBP_ID', 'GR_ID', 'AP_ID', 'class']

[[  0.033   6.652   6.681   0.194   0.874   3.177]
 [  0.034   9.039   6.224   0.194   1.137   3.177]
 [  0.035  10.936  10.304   1.015   0.911   4.9  ]
 [  0.022  10.11    9.603   1.374   0.848   4.566]]

[0 0 1 1]
Sign up to request clarification or add additional context in comments.

2 Comments

You can simplify your column list creation to just this: clm_list = list(r)
I just copied your code. It ran my Scikit program. THanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.