encoding error when reading excel file [duplicate]

Question

I want to go through data in my folder, identify them and rename them according to a list of rules I have in an excel spreadsheet I load the needed libraries, I make my directory the working directory; I read in the xcel file (using xlrd) and when I try to read the data by columns e.g. :

fname = metadata.col_values(0, start_rowx=1, end_rowx=None)

the list of values comes with a u in front of them - I guess unicode - such as: fname = [u'file1', u'file2'] and so on

How can I convert fname to a list of ascii strings?

thanks for the comments / suggestions; I am not sure if the unicode is the problem, but I think this is the problem since the code cannot identify file1, file2 etc in my folder --I believe the error was the presence of u — Dimitris
– Dimitris, Commented Jul 23, 2013 at 14:44

Slater Victoroff · Accepted Answer · 2013-07-22 14:20:54Z

0

I'm not sure what the big issue behind having unicode filenames is, but assuming that all of your characters are ascii-valid characters the following should do it. This solution will just ignore anything that's non-ascii, but it's worth thinking about why you're doing this in the first place:

ascii_string = unicode_string.encode("ascii", "ignore")

Specifically, for converting a whole list I would use a list comprehension:

ascii_list = [old_string.encode("ascii", "ignore") for old_string in fname]

answered Jul 22, 2013 at 14:20

Slater Victoroff

22k23 gold badges92 silver badges149 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dimitris Over a year ago

thank - probably your are right, and unicode is not the problem with the code; I will test and update the code - I will post the results

Henry Keiter · Accepted Answer · 2013-07-22 14:28:26Z

0

The u at the front is just a visual item to show you, when you print the string, what the underlying representation is. It's like the single-quotes around the strings when you print that list--they are there to show you something about the object being printed (specifically, that it's a string), but they aren't actually a part of the object.

In the case of the u, it's saying it's a unicode object. When you use the string internally, that u on the outside doesn't exist, just like the single-quotes. Try opening a file and writing the strings there, and you'll see that the u and the single-quotes don't show up, because they're not actually part of the underlying string objects.

with open(r'C:\test\foo.bar', 'w') as f:
    for item in fname:
        f.write(item)
        f.write('\n')

If you really need to print strings without the u at the start, you can convert them to ASCII with u'unicode stuff'.encode('ascii'), but honestly I doubt this is something that actually matters for what you're doing.

You could also just use Python 3, where Unicode is the default and the u isn't normally printed.

edited Jul 22, 2013 at 14:28

answered Jul 22, 2013 at 14:22

Henry Keiter

17.3k8 gold badges53 silver badges85 bronze badges

2 Comments

Dimitris Over a year ago

thanks - I now believe unicode might not be my problem; I will update the post as soon as I know better

jfs Over a year ago

f.write(item) fails if item is a Unicode string with characters outside ascii (sys.getdefaultencoding()). Use codecs.open() with explicit character encoding instead.

Collectives™ on Stack Overflow

encoding error when reading excel file [duplicate]

2 Answers 2

1 Comment

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Linked

Related