-1

I want to go through data in my folder, identify them and rename them according to a list of rules I have in an excel spreadsheet I load the needed libraries, I make my directory the working directory; I read in the xcel file (using xlrd) and when I try to read the data by columns e.g. :

fname = metadata.col_values(0, start_rowx=1, end_rowx=None)

the list of values comes with a u in front of them - I guess unicode - such as: fname = [u'file1', u'file2'] and so on

How can I convert fname to a list of ascii strings?

2
  • 3
    what's the big deal if the strings are in unicode? Commented Jul 22, 2013 at 13:59
  • thanks for the comments / suggestions; I am not sure if the unicode is the problem, but I think this is the problem since the code cannot identify file1, file2 etc in my folder --I believe the error was the presence of u Commented Jul 23, 2013 at 14:44

2 Answers 2

0

I'm not sure what the big issue behind having unicode filenames is, but assuming that all of your characters are ascii-valid characters the following should do it. This solution will just ignore anything that's non-ascii, but it's worth thinking about why you're doing this in the first place:

ascii_string = unicode_string.encode("ascii", "ignore")

Specifically, for converting a whole list I would use a list comprehension:

ascii_list = [old_string.encode("ascii", "ignore") for old_string in fname]
Sign up to request clarification or add additional context in comments.

1 Comment

thank - probably your are right, and unicode is not the problem with the code; I will test and update the code - I will post the results
0

The u at the front is just a visual item to show you, when you print the string, what the underlying representation is. It's like the single-quotes around the strings when you print that list--they are there to show you something about the object being printed (specifically, that it's a string), but they aren't actually a part of the object.

In the case of the u, it's saying it's a unicode object. When you use the string internally, that u on the outside doesn't exist, just like the single-quotes. Try opening a file and writing the strings there, and you'll see that the u and the single-quotes don't show up, because they're not actually part of the underlying string objects.

with open(r'C:\test\foo.bar', 'w') as f:
    for item in fname:
        f.write(item)
        f.write('\n')

If you really need to print strings without the u at the start, you can convert them to ASCII with u'unicode stuff'.encode('ascii'), but honestly I doubt this is something that actually matters for what you're doing.

You could also just use Python 3, where Unicode is the default and the u isn't normally printed.

2 Comments

thanks - I now believe unicode might not be my problem; I will update the post as soon as I know better
f.write(item) fails if item is a Unicode string with characters outside ascii (sys.getdefaultencoding()). Use codecs.open() with explicit character encoding instead.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.