Reading a specific number of lines of a .csv in python

Question

I have a csv file that I am trying to read into python and then I want to store the first two columns in a variable called name and gender. The current code I am using is the following:

import csv
infile = open('blue.csv', 'r')
csvfile = csv.reader(infile)

name = []
gender = []

for row in csvfile:
    name.append(row[0])
    gender.append(row[1])

There are two problems I am encountering:

1) The csv file has headers so I don't want those included inside the variables when I store the columns

2) I am missing the gender for the last row of the csv file and so I don't want to include the last line of the csv file when I store it in a variable.

I am an R programmer and so to me, the way I would get around this is to read in the file excluding the first row and last row but I am unsure of how to do this in python, or better yet, if there is a better/smarter alternative.

If it helps, here is what a mock dataset would look like:

Name, Gender
Bob, Male
Susan, Female
Doug,

If you have a privilege to use pandas, please have a look: pandas.pydata.org/pandas-docs/stable/generated/… — Gurupad Hegde
– Gurupad Hegde, Commented Oct 20, 2015 at 18:38

Ami Tavory · Accepted Answer · 2015-10-20 18:28:54Z

4

You wrote

I am an R programmer and so to me, the way I would get around this is to read in the file excluding the first row and last row but I am unsure of how to do this in python

This can be done with readlines and list slicing like so:

open('foo.csv').readlines()[1: -1]

Furthermore, note that csv.reader takes both a file object and a list:

csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable.

So you can just use:

for l in csv.reader(open('foo.csv').readlines()[1: -1]):
    ...

answered Oct 20, 2015 at 18:28

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Gurupad Hegde · Accepted Answer · 2015-10-20 18:54:16Z

As you are an R programmer, I would recommend you to try pandas.

1) The csv file has headers so I don't want those included inside the variables when I store the columns

You can read csv read_csv() which takes care of headers without any setting.

2) I am missing the gender for the last row of the csv file and so I don't want to include the last line of the csv file when I store it in a variable.

I think your requirement is to skip the lines with missing data, you can use dropna()

So, coding part:

In [1]: import pandas as pd

In [2]: !cat sample_data.csv
Name, Gender
Bob, Male
Susan, Female
Doug,

In [3]: pd.read_csv("./sample_data.csv").dropna()
Out[3]: 
    Name   Gender
0    Bob     Male
1  Susan   Female

Maikflow · Accepted Answer · 2015-10-20 18:18:50Z

0

You can use slicing in combination with a try except loop as so:

for row in csvfile[1:]:
    try:
        gender.append(row[1])
        name.append(row[0])
    except:
        continue

This code skips over any line that doesn't have a gender, not only the last line.

answered Oct 20, 2015 at 18:18

Maikflow

1911 silver badge8 bronze badges

Comments

LetzerWille · Accepted Answer · 2015-10-20 18:40:07Z

0

import csv

with open('data.csv', 'r') as f1:
    numberOflines = len([line for line in f1]) -2
    f1.seek(0)
    r = csv.reader(f1)
    next(r, None) # skip first line
    for row in r:
        if numberOflines > 0:
            print(row[0])
            numberOflines -=1

edited Oct 20, 2015 at 18:40

answered Oct 20, 2015 at 18:19

LetzerWille

5,6965 gold badges26 silver badges28 bronze badges

Collectives™ on Stack Overflow

Reading a specific number of lines of a .csv in python

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related