3

I have a CSV file that i need to rearrange and renecode. I'd like to run

line = line.decode('windows-1250').encode('utf-8')

on each line before it's parsed and split by the CSV reader. Or I'd like iterate over lines myself run the re-encoding and use just single line parsing form CSV library but with the same reader instance.

Is there a way to do that nicely?

1
  • No, but is there any difference? Commented Feb 27, 2010 at 10:55

3 Answers 3

2

Loop over lines on file can be done this way:

with open('path/to/my/file.csv', 'r') as f:
    for line in f:
        puts line # here You can convert encoding and save lines

But if You want to convert encoding of a whole file You can also call:

$ iconv -f Windows-1250 -t UTF8 < file.csv > file.csv

Edit: So where the problem is?

with open('path/to/my/file.csv', 'r') as f:
    for line in f:
        line = line.decode('windows-1250').encode('utf-8')
        elements = line.split(",")
Sign up to request clarification or add additional context in comments.

2 Comments

I do not want to read/write the file twice. The iconv solution is lame, I want it done in code no by some tool, I need to crate a tool that will prepare files in one process not instructions to do that.
Again, no support for CSV parsing at the same time, line splitting just won't cut it.
2

Thx, for the answers. The wrapping one gave me an idea:

def reencode(file):
    for line in file:
        yield line.decode('windows-1250').encode('utf-8')

csv_writer = csv.writer(open(outfilepath,'w'), delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)
csv_reader = csv.reader(reencode(open(filepath)), delimiter=";",quotechar='"')
for c in csv_reader:
    l = # rearange columns here
    csv_writer.writerow(l)

That's exactly what i was going for re-encoding a line just before it's get parsed by the csv_reader.

Comments

2

At the very bottom of the csv documentation is a set of classes (UnicodeReader and UnicodeWriter) that implements Unicode support for csv:

rfile = open('input.csv')
wfile = open('output.csv','w')
csv_reader = UnicodeReader(rfile,encoding='windows-1250')
csv_writer = UnicodeWriter(wfile,encoding='utf-8')
for c in csv_reader:
    # process Unicode lines
    csv_writer.writerow(c)
rfile.close()
wfile.close()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.