1

So I have a little problem. I have a .csv matrix that I want to transform in a numpy array so i found this: np.genfromtxt('/Users/username/Documents/fichieretudebis.csv', delimiter= ';')

The matter is that my .csv matrix contains number and string, and I need both of them to appear in my array ( but I want them to keep their type) I tried to transform the matrix in a str matrix (with dtype=str) but I can't transform the number back in a float type. Does someone know how to do it ? Thx

More explanation :

My .csv file is like thisenter image description here

I need to use this file in order to create a tree ( using sklearn and Random forest algorithms)

This is what I currently wrote enter image description here

( file called ResultatBis and Previsionbis have the same problem ).

I don't know how to create a array that's going to be recognize by sklearn without using the numpylibrary but I need my matrix to stay exactly the same.

Tell me if that's enough explanation and thx for your future help !

1
  • numpy is for homogeneous, aligned, data. for more exotic schemes, have a look at pandas. Commented Mar 2, 2016 at 19:06

2 Answers 2

2

do

np.genfromtxt('/Users/username/Documents/fichieretudebis.csv', delimiter= ';',dtype=None)

(after https://stackoverflow.com/a/15481761/1461850)

Sign up to request clarification or add additional context in comments.

2 Comments

Thx ! It's going to help me but how do you get ride of a b in front of all string elements. [(44, 75007, 0, 0, 0, b'gmail') (31, 75018, 13, 1, 0, b'gmail') (25, 75001, 11, 1, 1, b'gmail') (11, 75019, 4, 1, 0, b'gmail')] This is the type of output I get
The b is just Python3's way of indicating that it read byte (ASCII) strings from your file. Py3's default string type is unicode. Look at the dtype. For this field it probably is <S5 or the like. Under Python2 you wouldn't see the b, but the dtype would be the same.
0

You can also try using Pandas:

import pandas as pd
prediction = pd.read_csv('/Users/username/Documents/fichieretudebis.csv', delimiter= ';')

Pandas is very popular for reading and manipulating data from .csv datasets. In my machine learning assignments I've always used it.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.