export data from csv file containing unicode characters

Question

I would like to export data from a csv file which contains unicode strings.

Previously I tried a Python script which works fine for ASCII data only. But it won't support unicode stuff either:

#! /usr/bin/env python
import csv
csv.register_dialect('custom',delimiter=','
                     doublequote=True,
                     escapechar=None,
                     quotechar='"',
                     quoting=csv.QUOTE_MINIMAL, skipinitialspace=False)
with open('input.csv') as ifile:
 data = csv.reader(ifile, dialect='custom')
 for record in data:
  for i, field in enumerate(record):
   print (" <field%s>" % i + field + "</field%s>" % i)

Traceback (most recent call last): for record in data: _csv.Error: line contains NULL byte

#! /usr/bin/env python import csv csv.register_dialect('custom',delimiter=',', doublequote=True,escapechar=None, quotechar='"',quoting=csv.QUOTE_MINIMAL, skipinitialspace=False) with open('input.csv') as ifile: data = csv.reader(ifile, dialect='custom') for record in data: for i, field in enumerate(record): print (" <field%s>" % i + field + "</field%s>" % i) Traceback (most recent call last): for record in data: _csv.Error: line contains NULL byte — Vicky
– Vicky, Commented May 15, 2013 at 6:13
There's no such thing as a Unicode file. Files are always in some encoding: this one is probably utf-8. — Daniel Roseman
– Daniel Roseman, Commented May 15, 2013 at 7:22
If the question is about Python 3; add python-3.x tag to the question. — jfs
– jfs, Commented May 15, 2013 at 8:14

Joran Beasley · Accepted Answer · 2013-05-15 15:00:05Z

2

use this unicode-csv library instead

https://github.com/jdunck/python-unicodecsv

import unicodecsv as csv

with open('input.csv') as ifile:
  rows = [row for row in csv.reader(ifile, encoding='utf-8')]

print rows

edited May 15, 2013 at 15:00

answered May 15, 2013 at 6:09

Joran Beasley

114k13 gold badges167 silver badges187 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Vicky Over a year ago

I had never used Python , I download the package as specified by you.Can you please tell me how I can use the same in to my python script or what changes I have to do for the same to work upon Unicode based csv file.

Vicky Over a year ago

there is problem in the file named init.py line no 49 except TypeError, e: replace it with except e: am I correct or not? After this I am getting erro for import unicodecsv as csv File "C:\Program Files (x86)\Python30\unicodecsv_init_.py", line 3, in <mod ule> from itertools import izip ImportError: cannot import name izip

Vicky Over a year ago

I removed the import statment(from itertools import izip) from the file named init.py as python 3 wont need it anymore... I am getting error as follows:- record = list(csv.reader(ifile, encoding='utf-8')) TypeError: iter() returned non-iterator of type 'UnicodeReader'

Vicky Over a year ago

Can anybody help me in using the Unicodecsv-csv library...... I am getting the following errors:- Traceback (most recent call last): File line 6, in <module> rows = list(csv.reader(ifile, encoding='utf-8')) TypeError: iter() returned non-iterator of type 'UnicodeReader'

Joran Beasley Over a year ago

modified ... I thought you could just call list on it ... but it looks like you need to manually iterate it

mfitzp · Accepted Answer · 2013-05-15 14:24:30Z

1

You can wrap the csv.reader in a class to handle it for you. The following is taken from the csv documentation examples and works for me:

#! /usr/bin/env python
import csv, codecs

class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)

    def __iter__(self):
        return self

    def next(self):
        return self.reader.next().encode("utf-8")

class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)

    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]

    def __iter__(self):
        return self




csv.register_dialect('custom', delimiter=',',
                     doublequote=True,
                     escapechar=None,
                     quotechar='"',
                     quoting=csv.QUOTE_MINIMAL, skipinitialspace=False)

with open('input.csv') as ifile:
 data = UnicodeReader(ifile, dialect='custom')
 for record in data:
  for i, field in enumerate(record):
   print (" <field%s>" % i + field + "</field%s>" % i)

There is also a UnicodeWriter class there if you need that functionality.

edited May 15, 2013 at 14:24

answered May 15, 2013 at 7:22

mfitzp

15.6k7 gold badges55 silver badges72 bronze badges

3 Comments

Vicky Over a year ago

tell me how I can use it in my code as you can see my code above... or send me complete sample code if possible....as I am beginner in Python

Vicky Over a year ago

Is there anybody who can help me in soving my problem to export the data from a csv file which contain diffrent language strings inside ....

mfitzp Over a year ago

I've added a full code segment above, this works fine for me in Python 2.7.4 to read in files containing utf-8.

jfs · Accepted Answer · 2013-05-15 08:36:27Z

0

It seems you are using Python 3. Follow the very first code example in the docs:

#!/usr/bin/env python3
import csv

with open('input.csv', newline='', encoding=encoding) as csvfile:
    reader = csv.reader(csvfile, dialect="custom")
    for row in reader:
        print(", ".join(row))

where "custom" dialect is defined in the code in your question and encoding is the character encoding of your file such as "utf-16". If you omit encoding argument; the encoding returned by locale.getpreferredencoding(False) is used.

edited May 15, 2013 at 8:36

answered May 15, 2013 at 8:07

jfs

417k210 gold badges1k silver badges1.7k bronze badges

4 Comments

Vicky Over a year ago

ImportError: cannot import name escape

jfs Over a year ago

html.escape() is available since Python 3.2 . You could use cgi.escape(field, quote=True) on earlier versions. Note: it is unrelated to your Unicode issue

Vicky Over a year ago

I am facing error because of line from html import escape How I can resolve it?

jfs Over a year ago

I've removed html.escape (to deal with one issue at a time)

Collectives™ on Stack Overflow

export data from csv file containing unicode characters

3 Answers 3

5 Comments

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related