how to convert pandas data frame into numpy data frame

Question

I have one simple data set with class label and stored as "mydata.csv",

GA_ID   PN_ID   PC_ID   MBP_ID  GR_ID   AP_ID   class
0.033   6.652   6.681   0.194   0.874   3.177     0
0.034   9.039   6.224   0.194   1.137   3.177     0
0.035   10.936  10.304  1.015   0.911   4.9       1
0.022   10.11   9.603   1.374   0.848   4.566     1

i simply use given code to convert this data into numpy array so that i can use this data set for predictions and machine learning modeling but due to header is error has been raised "ValueError: could not convert string to float: " when i removed header from the file this method work well for me :

import numpy as np
#from sklearn import metrics
#from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

raw_data = open("/home/me/Desktop/scklearn/data.csv")
dataset = np.loadtxt(raw_data, delimiter=",")
X = dataset[:,0:5]
y = dataset[:,6]

i also tried to skip header but error occurs:

dataset = np.loadtxt(raw_data, delimiter=",")[1:]

then i moved to pandas and able import data from this method:

raw_data = pandas.read_csv("/home/me/Desktop/scklearn/data.csv")

but here I sucked again when i tried to convert this into numpy array its showing error like previous.

is there any method available in pandas that can : save heathers as list :

header_list = ('GA_ID','PN_ID','PC_ID' ,'MBP_ID' ,'GR_ID' , 'AP_ID','class')

last column as class label and remaining part(1:4,0:5) to numpy array for model building:

I have write down a code to get column list

clm_list = []
raw_data = pandas.read_csv("/home/me/Desktop/scklearn/data.csv")
clms = raw_data.columns()
for clm in clms:
    clm_list.append(clm)
print clm_list ## produces column list

Unclear what your real problem here is, pandas dataframes are compatible with sklearn interfaces, also if you don't want to write the header to a csv from pandas than you can pass param header=None in to_csv — EdChum
– EdChum, Commented Apr 7, 2015 at 10:51
@EdChum yes this is true actually my problem is that 1) if suppose i pass param as header=None and after modeling or at the time of feature selection i want to know the header how would i know the headers as i overlooked the header at the time of file opening. and 2) how can i use the given example data directly with pandas to scikit-learn data frame in the form of X = (data without header and class label) and y = (class label for predictions ) — jax
– jax, Commented Apr 7, 2015 at 10:55
Well you can do all this pandas fine, like I said the sklearn interfaces are compatible with pandas dfs — EdChum
– EdChum, Commented Apr 7, 2015 at 11:00
@EdChum Hi thanks for reply i have solve my problem and write down a code which i have posted as a answer. This code is doing well for me. thanks — jax
– jax, Commented Apr 7, 2015 at 11:47

jax · Accepted Answer · 2015-04-07 11:44:28Z

3

after reading a lot finally I achieved what I want and successfully implemented data on scikit-learn, code to convert CSV data with scikit-learn compatible form is given bellow. thanks

import pandas as pd
r = pd.read_csv("/home/zebrafish/Desktop/ex.csv")
print r.values

clm_list = []
for column in r.columns:
    clm_list.append(column)


X = r[clm_list[0:len(clm_list)-1]].values
y = r[clm_list[len(clm_list)-1]].values

print clm_list
print X
print y

out come of this code is exactly what i want :

['GA_ID', 'PN_ID', 'PC_ID', 'MBP_ID', 'GR_ID', 'AP_ID', 'class']

[[  0.033   6.652   6.681   0.194   0.874   3.177]
 [  0.034   9.039   6.224   0.194   1.137   3.177]
 [  0.035  10.936  10.304   1.015   0.911   4.9  ]
 [  0.022  10.11    9.603   1.374   0.848   4.566]]

[0 0 1 1]

answered Apr 7, 2015 at 11:44

jax

4,21710 gold badges44 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

EdChum Over a year ago

You can simplify your column list creation to just this: clm_list = list(r)

Chakra Over a year ago

I just copied your code. It ran my Scikit program. THanks.

Collectives™ on Stack Overflow

how to convert pandas data frame into numpy data frame

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related