How can I insert data from a CSV file into a dataframe using pandas.read_csv?

Question

I have a csv file like:

"B/G/213","B/C/208","WW_cis",,
"B/U/215","B/A/206","WW_cis",,
"B/C/214","B/G/207","WW_cis",,
"B/G/217","B/C/204","WW_cis",,
"B/A/216","B/U/205","WW_cis",,
"B/C/219","B/G/202","WW_cis",,
"B/U/218","B/A/203","WW_cis",,
"B/G/201","B/C/220","WW_cis",,
"B/A/203","B/U/218","WW_cis",,

and I want to read it into something like an array or dataframe, so that I would be able to compare elements from one column to selected elements from another columns. At first, I have read it straight into an array using numpy.genfromtxt, but I got stings like '"B/A/203"' with extra quotes " everywhere. I read somewhere, that pandas allows to strip strings of extra " so I tried:

class StructureReader(object):
    def __init__(self, filename):
        self.filename=filename
    def read(self):
        self.data=pd.read_csv(StringIO(str("RNA/"+self.filename)), header=None, sep = ",")
        self.data

but I get something like so:

<class 'pandas.core.frame.DataFrame'> 0 0 RNA/4v6p.csv

How can I get my CSV file into some kind of a data type that would allow me to search through columns and rows?

My comment now seems mean... the original question had that in the narrative. I meant it to encourage the OP's quest for knowledge. — tmthydvnprt
– tmthydvnprt, Commented Mar 20, 2016 at 16:50

tmthydvnprt · Accepted Answer · 2016-03-20 16:58:29Z

Data Insert

You are putting the string of the filename into your DataFrame, i.e. RNA/4v6p.csv is your data in location row 0, col 0. You need to read in the file and store the data. This can be done by removing StringIO(str(...)) in your class

class StructureReader(object):
    def __init__(self, filename):
        self.filename = filename
    def read(self):
        self.data = pd.read_csv("RNA/"+self.filename), header=None, sep = ",")
        self.data

Code structure critique

I would also recommend removing the parent directory from being hardcoded by

Always passing in a full file path

class StructureReader(object):
    def __init__(self, filepath):
        self.filepath = filepath
    def read(self):
        self.data = pd.read_csv(self.filepath), header=None, sep = ",")
        self.data

Making the directory an __init__() argument

class StructureReader(object):
    def __init__(self, directory, filename):
        self.directory = directory
        self.filename = filename
    def read(self):
        self.data=pd.read_csv(self.directory+"/"+self.filename), header=None, sep = ",")
        # or import os and self.data=pd.read_csv(os.path.join(self.directory, self.filename)), header=None, sep = ",")
        self.data

Making the directory a constant attribute

class StructureReader(object):
    def __init__(self, filename):
        self.directory = "RNA"
        self.filename = filename
    def read(self):
        self.data = pd.read_csv(self.directory+"/"+self.filename), header=None, sep = ",")
        # or import os and self.data=pd.read_csv(os.path.join(self.directory, self.filename)), header=None, sep = ",")
        self.data

This has nothing to do with reading your data, just a best practice commentary on structuring you code (Just my $0.02).

Fabio Lamanna · Accepted Answer · 2016-03-20 15:31:20Z

2

IIUC, you can just read it with:

df = pd.read_csv('yourfile.csv', header=None)

that for me returns:

         0        1       2   3   4
0  B/G/213  B/C/208  WW_cis NaN NaN
1  B/U/215  B/A/206  WW_cis NaN NaN
2  B/C/214  B/G/207  WW_cis NaN NaN
3  B/G/217  B/C/204  WW_cis NaN NaN
4  B/A/216  B/U/205  WW_cis NaN NaN
5  B/C/219  B/G/202  WW_cis NaN NaN
6  B/U/218  B/A/203  WW_cis NaN NaN
7  B/G/201  B/C/220  WW_cis NaN NaN
8  B/A/203  B/U/218  WW_cis NaN NaN

you can then select only the columns you want with:

df = df[[0,1,2]]

and operate as usual with dataframes.

answered Mar 20, 2016 at 15:31

Fabio Lamanna

21.7k24 gold badges95 silver badges126 bronze badges

Comments

MaxU - stand with Ukraine · Accepted Answer · 2016-03-20 16:36:12Z

I think you've mixed up StringIO with the file name. You either have your data as a string and then you use StringIO or you simply specify a file name (not using StringIO):

In [189]: data="""\
   .....: "B/G/213","B/C/208","WW_cis",,
   .....: "B/U/215","B/A/206","WW_cis",,
   .....: "B/C/214","B/G/207","WW_cis",,
   .....: "B/G/217","B/C/204","WW_cis",,
   .....: "B/A/216","B/U/205","WW_cis",,
   .....: "B/C/219","B/G/202","WW_cis",,
   .....: "B/U/218","B/A/203","WW_cis",,
   .....: "B/G/201","B/C/220","WW_cis",,
   .....: "B/A/203","B/U/218","WW_cis",,
   .....: """

In [190]:

In [190]: df = pd.read_csv(io.StringIO(data), sep=',', header=None, usecols=[0,1,2])

In [191]: df
Out[191]:
         0        1       2
0  B/G/213  B/C/208  WW_cis
1  B/U/215  B/A/206  WW_cis
2  B/C/214  B/G/207  WW_cis
3  B/G/217  B/C/204  WW_cis
4  B/A/216  B/U/205  WW_cis
5  B/C/219  B/G/202  WW_cis
6  B/U/218  B/A/203  WW_cis
7  B/G/201  B/C/220  WW_cis
8  B/A/203  B/U/218  WW_cis

PS you can decide what columns do you want to parse (to have in your data frame) - look at the usecols parameter

Or using file name

import os

df = pd.read_csv(os.path.join('RNA', self.filename), sep=',', header=None, usecols=[0,1,2])

Collectives™ on Stack Overflow

How can I insert data from a CSV file into a dataframe using pandas.read_csv?

3 Answers 3

Data Insert

Code structure critique

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Data Insert

Code structure critique

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related