How to delete columns in a CSV file?

Question

I have a .csv file which looks like this:

day,month,year,lat,long
01,04,2001,45.00,120.00
02,04,2003,44.00,118.00

I am trying to delete the "year" column and all of its entries. In total, there are 40+ entries with the range of the years from 1960-2010.

This is the type of problem where awk shines: $ awk -F, 'BEGIN {OFS=","} {print $1,$2,$4,$5}' ex.csv — Eric Wilson
– Eric Wilson, Commented Sep 28, 2011 at 20:20
@Eric Wilson: Luckily, this CSV file has no quotes, allowing AWK to work. — S.Lott
– S.Lott, Commented Sep 29, 2011 at 9:55
@S.Lott I agree, when the CSV format gets more complicated, Python's csv is the way to go. I only use awk when it clearly works, and is only one line. — Eric Wilson
– Eric Wilson, Commented Sep 29, 2011 at 12:44

Ryan R · Accepted Answer · 2019-01-02 02:30:38Z

65

import csv
with open("source","rb") as source:
    rdr= csv.reader( source )
    with open("result","wb") as result:
        wtr= csv.writer( result )
        for r in rdr:
            wtr.writerow( (r[0], r[1], r[3], r[4]) )

BTW, the for loop can be removed, but not really simplified.

        in_iter= ( (r[0], r[1], r[3], r[4]) for r in rdr )
        wtr.writerows( in_iter )

Also, you can stick in a hyper-literal way to the requirements to delete a column. I find this to be a bad policy in general because it doesn't apply to removing more than one column. When you try to remove the second, you discover that the positions have all shifted and the resulting row isn't obvious. But for one column only, this works.

            del r[2]
            wtr.writerow( r )

edited Jan 2, 2019 at 2:30

Ryan R

8,50116 gold badges88 silver badges112 bronze badges

answered Sep 28, 2011 at 21:08

S.Lott

393k83 gold badges521 silver badges791 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jeff Over a year ago

This one worked nearly flawlessly, an error came up regarding the syntax. The colon should be deleted from wtr=csv.writer(result) Thanks for your input on this it has helped, it is also handy because it works on any number of columns I may need to delete.

Satvik Beri Over a year ago

You can easily use your second method for multiple columns by deleting the highest column first, e.g. 'del r[8] del r[6] del r[2] wtr.writerow(r)'

srcerer Over a year ago

You can save some writing for bigger CSV's by replacing (r[0], r[1], r[3], r[4]) with something like tuple(r[ii] for ii in range(len(r)) if ii != 2)

bobobobo Over a year ago

To delete more than 1 column in your last point, can't you just use the classic delete 'em backwards workaround?

SunilThorat · Accepted Answer · 2015-12-24 16:49:20Z

54

Use of Pandas module will be much easier.

import pandas as pd
f=pd.read_csv("test.csv")
keep_col = ['day','month','lat','long']
new_f = f[keep_col]
new_f.to_csv("newFile.csv", index=False)

And here is short explanation:

>>>f=pd.read_csv("test.csv")
>>> f
   day  month  year  lat  long
0    1      4  2001   45   120
1    2      4  2003   44   118
>>> keep_col = ['day','month','lat','long'] 
>>> f[keep_col]
    day  month  lat  long
0    1      4   45   120
1    2      4   44   118
>>>

answered Dec 24, 2015 at 16:49

SunilThorat

1,7782 gold badges13 silver badges15 bronze badges

4 Comments

technogeek1995 Over a year ago

This works even if your csv has line breaks in a string on the the row - many other linux commands like cut fail to remove columns and maintain the data integrity when a row's field has a line break as part of the content of the csv

Gunarathinam Over a year ago

In my case, the integer are get converted to float.

ntjess Over a year ago

@Gunarathinam you can prevent this in newer pandas versions by passing dtype=str to read_csv

CobaltBlue Over a year ago

The best part of this solution is that the columns are named and therefore position independent. The columns to keep (or to be removed) can be passed in or read from another file. Nicely done.

Aimon Bustardo · Accepted Answer · 2012-11-16 05:50:43Z

6

Using a dict to grab headings then looping through gets you what you need cleanly.

import csv
ct = 0
cols_i_want = {'cost' : -1, 'date' : -1}
with open("file1.csv","rb") as source:
    rdr = csv.reader( source )
    with open("result","wb") as result:
        wtr = csv.writer( result )
        for row in rdr:
            if ct == 0:
              cc = 0
              for col in row:
                for ciw in cols_i_want: 
                  if col == ciw:
                    cols_i_want[ciw] = cc
                cc += 1
            wtr.writerow( (row[cols_i_want['cost']], row[cols_i_want['date']]) )
            ct += 1

answered Nov 16, 2012 at 5:50

Aimon Bustardo

1581 silver badge7 bronze badges

Comments

Tom Solid · Accepted Answer · 2023-08-03 07:20:28Z

6

I would use Pandas with col number

f = pd.read_csv("test.csv", usecols=[0,1,3,4])
f.to_csv("test.csv", index=False)

edited Aug 3, 2023 at 7:20

Tom Solid

2,5312 gold badges18 silver badges35 bronze badges

answered Apr 21, 2020 at 16:03

dario

1411 silver badge3 bronze badges

Comments

Tunaki · Accepted Answer · 2016-03-28 13:17:13Z

3

You can directly delete the column with just

del variable_name['year']

edited Mar 28, 2016 at 13:17

Tunaki

138k46 gold badges370 silver badges443 bronze badges

answered Mar 28, 2016 at 13:16

ankur

2,0592 gold badges12 silver badges12 bronze badges

1 Comment

ZekeC Over a year ago

Doesn't work for me. It says it requires an integer since it expects and index

aweis · Accepted Answer · 2011-09-28 20:11:51Z

2

you can use the csv package to iterate over your csv file and output the columns that you want to another csv file.

The example below is not tested and should illustrate a solution:

import csv

file_name = 'C:\Temp\my_file.csv'
output_file = 'C:\Temp\new_file.csv'
csv_file = open(file_name, 'r')
## note that the index of the year column is excluded
column_indices = [0,1,3,4]
with open(output_file, 'w') as fh:
    reader = csv.reader(csv_file, delimiter=',')
    for row in reader:
       tmp_row = []
       for col_inx in column_indices:
           tmp_row.append(row[col_inx])
       fh.write(','.join(tmp_row))

edited Sep 28, 2011 at 20:11

answered Sep 28, 2011 at 20:06

aweis

5,6564 gold badges38 silver badges54 bronze badges

2 Comments

Steven Rumbalski Over a year ago

Dispense with the the tmp_row and the join and use csv.writer and a generator expression: for row in reader: wtr.writerow(row[i] for i in column_indices). It's safer (handles quoting automatically), more concise, and faster.

S.Lott Over a year ago

Why not use csv for writing, also?

broc · Accepted Answer · 2011-09-28 20:13:24Z

2

Off the top of my head, this will do it without any sort of error checking nor ability to configure anything. That is "left to the reader".

outFile = open( 'newFile', 'w' )
for line in open( 'oldFile' ):
   items = line.split( ',' )
   outFile.write( ','.join( items[:2] + items[ 3: ] ) )
outFile.close()

answered Sep 28, 2011 at 20:13

broc

2321 silver badge2 bronze badges

Comments

Clint Eastwood · Accepted Answer · 2021-10-04 20:16:47Z

2

I will add yet another answer to this question. Since the OP did not say they needed to do it with Python, the fastest way to delete the column (specially when the input file has hundreds of thousands of lines), is by using awk.

This is the type of problem where awk shines:

$ awk -F, 'BEGIN {OFS=","} {print $1,$2,$4,$5}' input.csv

(feel free to append > output.csv to the command above if you need the output to be saved to a file)

Credit goes 100% to @eric-wilson who provided this awesome answer, as a comment on the original question, 10 years ago, almost without any credit.

answered Oct 4, 2021 at 20:16

Clint Eastwood

5,7383 gold badges37 silver badges27 bronze badges

Comments

wbrycki · Accepted Answer · 2022-03-08 20:29:37Z

1

Try python with pandas and exclude the column, you don't want to have:

import pandas as pd

# the ',' is the default separator, but if your file has another one, you have to define it with sep= parameter
df = pd.read_csv("input.csv", sep=',')
exclude_column = "year"
new_df = df.loc[:, df.columns != exclude_column]
# you can even save the result to the same file
new_df.to_csv("input.csv", index=False, sep=',')

answered Mar 8, 2022 at 20:29

wbrycki

1511 silver badge10 bronze badges

Comments

mhd · Accepted Answer · 2023-02-10 03:46:27Z

1

My take using pandas's drop in python:

import pandas as pd

df = pd.read_csv("old.csv")
new_df = df.drop("year", axis=1)
new_df.to_csv("new.csv", index=False)

answered Feb 10, 2023 at 3:46

mhd

4,76711 gold badges40 silver badges54 bronze badges

Comments

KQ. · Accepted Answer · 2011-09-28 20:10:37Z

0

It depends on how you store the parsed CSV, but generally you want the del operator.

If you have an array of dicts:

input = [ {'day':01, 'month':04, 'year':2001, ...}, ... ]
for E in input: del E['year']

If you have an array of arrays:

input = [ [01, 04, 2001, ...],
          [...],
          ...
        ]
for E in input: del E[2]

answered Sep 28, 2011 at 20:10

KQ.

9325 silver badges9 bronze badges

Comments

Achraf Almouloudi · Accepted Answer · 2019-04-30 03:17:19Z

0

Try:

result= data.drop('year', 1)
result.head(5)

edited Apr 30, 2019 at 3:17

Achraf Almouloudi

74610 silver badges31 bronze badges

answered Apr 30, 2019 at 1:02

omega_mi

7486 silver badges15 bronze badges

Collectives™ on Stack Overflow

How to delete columns in a CSV file?

12 Answers 12

4 Comments

4 Comments

Comments

Comments

1 Comment

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

4 Comments

4 Comments

Comments

Comments

1 Comment

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related