How to drop a specific column of csv file while reading it using pandas?

Question

I need to remove a column with label name at the time of loading a csv using pandas. I am reading csv as follows and want to add parameters inside it to do so. Thanks.

pd.read_csv("sample.csv")

I know this to do after reading csv:

df.drop('name', axis=1)

@cᴏʟᴅsᴘᴇᴇᴅ: I don't know the total number of columns but it will be more than 100. I need the code to work for any number of columns. Thanks. — Anon George
– Anon George, Commented Feb 21, 2018 at 6:17

Jaroslav Bezděk · Accepted Answer · 2023-04-14 09:23:50Z

115

If you know the column names prior, you can do it by setting usecols parameter

When you know which columns to use

Suppose you have csv file with columns ["id", "name", "last_name"] and you want just ["name", "last_name"]. You can do it as below:

import pandas as pd
df = pd.read_csv("sample.csv", usecols=["name", "last_name"])

when you want first N columns

If you don't know the column names but you want first N columns from dataframe. You can do it by

import pandas as pd
df = pd.read_csv("sample.csv", usecols=[i for i in range(n)])

Edit

When you know name of the column to be dropped

# Read column names from file
cols = list(pd.read_csv("sample_data.csv", nrows=1))
print(cols)

# Use list comprehension to remove the unwanted column in **usecol**
df= pd.read_csv("sample_data.csv", usecols =[i for i in cols if i != "name"])

edited Apr 14, 2023 at 9:23

Jaroslav Bezděk

7,7156 gold badges34 silver badges59 bronze badges

answered Feb 21, 2018 at 6:06

Sociopath

13.4k22 gold badges53 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Anon George Over a year ago

I need every other columns except the column labeled 'name' and i dont know other labels, number of columns or the location of the label 'name'. So i can't use this answer, but thanks for the reply.

João Bravo Over a year ago

Reading 0 rows also works and is faster, although very marginally.

bsplosion Over a year ago

For anyone trying to optimize, just see the answer from jalopezp - no unnecessary reads of the file or awkward parsing, just using the built-in parameters of the read itself.

Jaroslav Bezděk · Accepted Answer · 2023-04-14 09:24:24Z

48

The only parameter to read_csv() that you can use to select the columns you use is usecols. According to the documentation, usecols accepts list-like or callable. Because you only know the columns you want to drop, you can't use a list of the columns you want to keep. So use a callable:

pd.read_csv("sample.csv", usecols=lambda x: x != "name")

And you could of course say x not in ["unwanted", "column", "names"] if you had a list of column names you didn't want to use.

edited Apr 14, 2023 at 9:24

Jaroslav Bezděk

7,7156 gold badges34 silver badges59 bronze badges

answered Jan 28, 2022 at 10:20

jalopezp

9141 gold badge8 silver badges14 bronze badges

3 Comments

Thecave3 Over a year ago

This is way much more clean than the accepted option

igorkf Over a year ago

Pretty cool solution!

crustification Over a year ago

This method is not supported with the "pyarrow" engine option selected, which is experimental but extremely fast. If you're looking for speed, use the answer from @Sociopath as well as the pyarrow engine

cs95 · Accepted Answer · 2019-06-20 13:24:43Z

13

Get the column headers from your CSV using pd.read_csv with nrows=1, then do a subsequent read with usecols to pull everything but the column(s) you want to omit.

headers = [*pd.read_csv('sample.csv', nrows=1)]
df = pd.read_csv('sample.csv', usecols=[c for c in headers if c != 'name']))

Alternatively, you can do the same thing (read only the headers) very efficiently using the CSV module,

import csv

with open("sample.csv", 'r') as f:
    header = next(csv.reader(f))
    # For python 2, use
    # header = csv.reader(f).next()

df = pd.read_csv('sample.csv', usecols=list(set(header) - {'name'}))

edited Jun 20, 2019 at 13:24

answered Feb 21, 2018 at 6:20

cs95

406k106 gold badges744 silver badges797 bronze badges

3 Comments

Anon George Over a year ago

This works fine. Is there a way to do it without importing 'csv' package? I mean, only using pandas.

cs95 Over a year ago

@AnonGeorge You could use pd.read_csv(..., nrows=1) and then examine the headers. Leaving that as an exercise to you :)

Anon George Over a year ago

header = csv.reader(f).next() will not work in python 3, i have edited your answer to correct it, but got rejected. :(

Ege · Accepted Answer · 2021-12-02 00:11:46Z

8

Using df= df.drop(['ID','prediction'],axis=1) made the work for me. I dropped 'ID' and 'prediction' columns. Make sure you put them in square brackets like ['column1','column2']. There is no need for other complicated solutions.

edited Dec 2, 2021 at 0:11

answered Apr 24, 2020 at 13:16

Ege

5417 silver badges16 bronze badges

2 Comments

spotnag Over a year ago

This works, instead of copying to df again though you could just add an arg inplace=True e.g. df.drop(['ID','prediction'],axis=1, inplace=True). Which will apply the change on df directly.

Niels Bom Over a year ago

OP already stated they know how to use drop, so not an answer to the question.

Arcane · Accepted Answer · 2021-01-03 17:12:10Z

4

Columns can be dropped at the time of reading itself.

columns_to_be_removed = ['a', 'b']

data = pd.read_csv(sourceFileName).drop(columns_to_be_removed, axis = 'columns')

answered Jan 3, 2021 at 17:12

Arcane

571 silver badge1 bronze badge

1 Comment

Niels Bom Over a year ago

Your solution is just chaining operations. The column is still read in the read_csv call.

Ganesh Ghuge · Accepted Answer · 2022-03-11 15:42:11Z

1

This answer with two lines of code will really help you. You can even dynamically remove column names while creating CSV.

https://stackoverflow.com/a/71440977/12819393

answered Mar 11, 2022 at 15:42

Ganesh Ghuge

2,5891 gold badge9 silver badges7 bronze badges

Collectives™ on Stack Overflow

How to drop a specific column of csv file while reading it using pandas?

6 Answers 6

3 Comments

3 Comments

3 Comments

2 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

3 Comments

3 Comments

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related