2

I have a csv file that I am trying to read into python and then I want to store the first two columns in a variable called name and gender. The current code I am using is the following:

import csv
infile = open('blue.csv', 'r')
csvfile = csv.reader(infile)

name = []
gender = []

for row in csvfile:
    name.append(row[0])
    gender.append(row[1])

There are two problems I am encountering:

1) The csv file has headers so I don't want those included inside the variables when I store the columns

2) I am missing the gender for the last row of the csv file and so I don't want to include the last line of the csv file when I store it in a variable.

I am an R programmer and so to me, the way I would get around this is to read in the file excluding the first row and last row but I am unsure of how to do this in python, or better yet, if there is a better/smarter alternative.

If it helps, here is what a mock dataset would look like:

Name, Gender
Bob, Male
Susan, Female
Doug,
4
  • Could you add a sample of your csv ? Commented Oct 20, 2015 at 18:14
  • I did, its above in the mock dataset. Commented Oct 20, 2015 at 18:15
  • The mock data set is not my python code. Commented Oct 20, 2015 at 18:19
  • If you have a privilege to use pandas, please have a look: pandas.pydata.org/pandas-docs/stable/generated/… Commented Oct 20, 2015 at 18:38

4 Answers 4

4

You wrote

I am an R programmer and so to me, the way I would get around this is to read in the file excluding the first row and last row but I am unsure of how to do this in python

This can be done with readlines and list slicing like so:

open('foo.csv').readlines()[1: -1]

Furthermore, note that csv.reader takes both a file object and a list:

csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable.

So you can just use:

for l in csv.reader(open('foo.csv').readlines()[1: -1]):
    ...
Sign up to request clarification or add additional context in comments.

Comments

1

As you are an R programmer, I would recommend you to try pandas.

1) The csv file has headers so I don't want those included inside the variables when I store the columns

You can read csv read_csv() which takes care of headers without any setting.

2) I am missing the gender for the last row of the csv file and so I don't want to include the last line of the csv file when I store it in a variable.

I think your requirement is to skip the lines with missing data, you can use dropna()

So, coding part:

In [1]: import pandas as pd

In [2]: !cat sample_data.csv
Name, Gender
Bob, Male
Susan, Female
Doug,

In [3]: pd.read_csv("./sample_data.csv").dropna()
Out[3]: 
    Name   Gender
0    Bob     Male
1  Susan   Female

Comments

0

You can use slicing in combination with a try except loop as so:

for row in csvfile[1:]:
    try:
        gender.append(row[1])
        name.append(row[0])
    except:
        continue

This code skips over any line that doesn't have a gender, not only the last line.

Comments

0
import csv

with open('data.csv', 'r') as f1:
    numberOflines = len([line for line in f1]) -2
    f1.seek(0)
    r = csv.reader(f1)
    next(r, None) # skip first line
    for row in r:
        if numberOflines > 0:
            print(row[0])
            numberOflines -=1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.