77

I need to remove a column with label name at the time of loading a csv using pandas. I am reading csv as follows and want to add parameters inside it to do so. Thanks.

pd.read_csv("sample.csv")

I know this to do after reading csv:

df.drop('name', axis=1)
2
  • Do you know in advance what columns your CSV has? Commented Feb 21, 2018 at 6:00
  • @cᴏʟᴅsᴘᴇᴇᴅ: I don't know the total number of columns but it will be more than 100. I need the code to work for any number of columns. Thanks. Commented Feb 21, 2018 at 6:17

6 Answers 6

115

If you know the column names prior, you can do it by setting usecols parameter

When you know which columns to use

Suppose you have csv file with columns ["id", "name", "last_name"] and you want just ["name", "last_name"]. You can do it as below:

import pandas as pd
df = pd.read_csv("sample.csv", usecols=["name", "last_name"])

when you want first N columns

If you don't know the column names but you want first N columns from dataframe. You can do it by

import pandas as pd
df = pd.read_csv("sample.csv", usecols=[i for i in range(n)])

Edit

When you know name of the column to be dropped

# Read column names from file
cols = list(pd.read_csv("sample_data.csv", nrows=1))
print(cols)

# Use list comprehension to remove the unwanted column in **usecol**
df= pd.read_csv("sample_data.csv", usecols =[i for i in cols if i != "name"])
Sign up to request clarification or add additional context in comments.

3 Comments

I need every other columns except the column labeled 'name' and i dont know other labels, number of columns or the location of the label 'name'. So i can't use this answer, but thanks for the reply.
Reading 0 rows also works and is faster, although very marginally.
For anyone trying to optimize, just see the answer from jalopezp - no unnecessary reads of the file or awkward parsing, just using the built-in parameters of the read itself.
48

The only parameter to read_csv() that you can use to select the columns you use is usecols. According to the documentation, usecols accepts list-like or callable. Because you only know the columns you want to drop, you can't use a list of the columns you want to keep. So use a callable:

pd.read_csv("sample.csv", usecols=lambda x: x != "name")

And you could of course say x not in ["unwanted", "column", "names"] if you had a list of column names you didn't want to use.

3 Comments

This is way much more clean than the accepted option
Pretty cool solution!
This method is not supported with the "pyarrow" engine option selected, which is experimental but extremely fast. If you're looking for speed, use the answer from @Sociopath as well as the pyarrow engine
13

Get the column headers from your CSV using pd.read_csv with nrows=1, then do a subsequent read with usecols to pull everything but the column(s) you want to omit.

headers = [*pd.read_csv('sample.csv', nrows=1)]
df = pd.read_csv('sample.csv', usecols=[c for c in headers if c != 'name']))

Alternatively, you can do the same thing (read only the headers) very efficiently using the CSV module,

import csv

with open("sample.csv", 'r') as f:
    header = next(csv.reader(f))
    # For python 2, use
    # header = csv.reader(f).next()

df = pd.read_csv('sample.csv', usecols=list(set(header) - {'name'}))

3 Comments

This works fine. Is there a way to do it without importing 'csv' package? I mean, only using pandas.
@AnonGeorge You could use pd.read_csv(..., nrows=1) and then examine the headers. Leaving that as an exercise to you :)
header = csv.reader(f).next() will not work in python 3, i have edited your answer to correct it, but got rejected. :(
8

Using df= df.drop(['ID','prediction'],axis=1) made the work for me. I dropped 'ID' and 'prediction' columns. Make sure you put them in square brackets like ['column1','column2']. There is no need for other complicated solutions.

2 Comments

This works, instead of copying to df again though you could just add an arg inplace=True e.g. df.drop(['ID','prediction'],axis=1, inplace=True). Which will apply the change on df directly.
OP already stated they know how to use drop, so not an answer to the question.
4

Columns can be dropped at the time of reading itself.

columns_to_be_removed = ['a', 'b']

data = pd.read_csv(sourceFileName).drop(columns_to_be_removed, axis = 'columns')

1 Comment

Your solution is just chaining operations. The column is still read in the read_csv call.
1

This answer with two lines of code will really help you. You can even dynamically remove column names while creating CSV.

https://stackoverflow.com/a/71440977/12819393

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.