2

I am dealing with some unicode strings, which I am encoding using utf-8 whenever I need to display them. This way I make sure that, even when redirecting the output of my script to a file, the proper encoding is used (I know there are other ways to do this, but this is not the point).

Now, sometimes I need to tabulate some data, and for that I use format specifiers, as shown below:

def tabulate(uni1, uni2):
    print "%-15s,%-15s" % (uni1.encode('utf-8'), uni2.encode('utf-8'))

print '01234567890123456789' # ruler
tabulate(u'HELLO', u'BYE')
tabulate(u'ñññññ', u'BYE')

This program will produce the following output

01234567890123456789
HELLO          ,BYE            
ñññññ     ,BYE

As you can see, the second string is not properly tabulated. I guess that %s is not aware of the encoding of the string, and computes badly its length.

Is there a solution to this problem?

2 Answers 2

1

Here is an implementation for what Ignacio pointed out, which is to do the formatting before the encoding:

def tabulate(uni1, uni2):
    print (u"%-15s,%-15s" % (uni1, uni2)).encode('utf-8')

>>> tabulate(u'HELLO', u'BYE')
HELLO          ,BYE            
>>> tabulate(u'ñññññ', u'BYE')
ñññññ          ,BYE    
Sign up to request clarification or add additional context in comments.

Comments

1

Format as unicode, then encode.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.