5

Now I know it's usually not feasible to modify a csv file as you are reading from it so you need to create a new csv file and write to it. The problem I'm having is preserving the original order of the data.

The input csv file looks like follows:

C1       C2         C3
apple    BANANA     Mango
pear     PineApple  StRaWbeRRy

I want to turn all the data into lower case and output a new csv file that looks like:

C1       C2         C3
apple    banana     mango
pear     pineapple  strawberry

So far I can iterate through the input csv file and turn all the values into lower case but I don't know how to rewrite it back into a csv file in that format. The code I have is:

def clean (input)
  aList = []
  file = open(input, "r")
  reader = csv.reader(file, delimiter = ',')
  next(reader, None) # Skip the header but I want to preserve it in the output csv file
  for row in reader:
     for col in row:
        aList.append(col.lower())

So now I have a list with all the lowercase data, how do I rewrite it back into a csv file of the same format (same number of rows and columns) as the input including the header row that I skipped in the code.

7
  • 1
    Don't bother saving the lines to a list. Just open both your input & output files at the same time, so you can write each modified line as you create it. In fact, I wouldn't even bother using the csv module for this. It's a pity you need to preserve the case of the header line, otherwise you could just process the whole file with the tr program (if you're using a Unix-like OS). Commented Nov 7, 2017 at 6:41
  • With pandas: pd.read_csv(input).apply(str.lower).to_csv(input) Commented Nov 7, 2017 at 6:43
  • I just noticed that your code specifies , as the delimiter, but your sample data uses whitespace. Please explain! Commented Nov 7, 2017 at 6:43
  • @PM2Ring You could still use command line tools if you use the head command to grab the header. Commented Nov 7, 2017 at 6:45
  • @PM2Ring I was just representing the data that way here. The input is in a csv file with those rows and columns. Having said that, I too don't know why the delimiter , works but it does! It was a mistake initially but it works just fine Commented Nov 7, 2017 at 6:45

4 Answers 4

12

Pandas way:

Read the file using pandas and get the dataframe. Then you can simply use lower()

import pandas as pd

def conversion(text):
    return text.lower()
    

df = pd.read_csv(file_path)
df[column_name] = df[column_name].map(conversion)

Or even a single liner:

df[column_name] = df[column_name].apply(lambda x: x.lower()) # If you have nan or other non-string values, you may need to convert x to string first like str(x).lower()

Then you can save it using to_csv function

Sign up to request clarification or add additional context in comments.

Comments

6

If all you want to do is change the case of the data and preserve everything else you might be best to skip the csv module and just use a straight file eg:

# Open both files
with open("infile.csv") as f_in, open("outfile.csv", 'w') as f_out:
    # Write header unchanged
    header = f_in.readline()
    f_out.write(header)

    # Transform the rest of the lines
    for line in f_in:
        f_out.write(line.lower())

Comments

3

If you want to use csv module for all then use following code snippet.

import os
import csv


def clean(input):
    tmpFile = "tmp.csv"
    with open(input, "r") as file, open(tmpFile, "w") as outFile:
        reader = csv.reader(file, delimiter=',')
        writer = csv.writer(outFile, delimiter=',')
        header = next(reader)
        writer.writerow(header)
        for row in reader:
            colValues = []
            for col in row:
                colValues.append(col.lower())
            writer.writerow(colValues)
    os.rename(tmpFile, input)

3 Comments

That's correct, then we need to create another file and copy the file finally.
You should fix the whitespace indent. You are using 1, 2, 3 and 4 spaces at different points. Python will not like this!
@Tim I'm using ideone which has a problem. I've fixed this using PyCharm.
0

the easiest way that i found is as follows let the initial CSV file name be test.csv

with open('test.csv','r') as f:
    with open('cleaned.csv','w') as ff:
        ff.write(f.readline())
        ff.write(f.read().lower())

the above code will create a new csv with all lower case

4 Comments

Ok, that works properly now. But like your earlier version, it unnecessarily reads the whole file into a string. Plus it uses more RAM to do the string concatenation, as Tim mentions. But I guess that's probably ok unless the file is huge, and changing the case of the whole file at once is more efficient than doing it line by line.
You would want to avoid the string concatenation. If this is a large file you are going have to allocate enough memory for the whole file, and then a second time to concatenate the header.
so, instead of concatenating, i should directly write it to file? @Tim
@user8898218 Yes. Strings are immutable in python so concatenation causes a new str to be instantiated and the contents of two strings being concatenated to be copied in.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.