Python: unable to convert unicode to string

Question

I've tried everything but the unicode just doesn't go away.

col = "[u'$929.95']"
unicoded_item = to_unicode(col) # [u'test']

print type(unicoded_item) # <type 'unicode'>
if isinstance(unicoded_item, unicode):
    unicoded_item = unicoded_item.encode('utf8')
    print str(unicoded_item) # [u'test']

I expected the whole [u' and '] to disappear but it just doesn't seem to convert. So when I save this string to a text file, the text file will literally have all the unicode python character [u'test'] is literally written instead of test

That's because you have what looks like the string representation of a list not a string. What do you get when you type print(col[0])? A t or test? — Burhan Khalid
– Burhan Khalid, Commented Oct 13, 2015 at 20:58
@BurhanKhalid omg...I see, wondering why the type doesn't say 'list' however if it was a list — user299709
– user299709, Commented Oct 13, 2015 at 20:58
That's because you are doing type(unicoded_item); when you did unicoded_item = to_unicode(col), it took the str representation of a list and then converted that to unicode. If you do type(col) you'll get the correct type. — Burhan Khalid
– Burhan Khalid, Commented Oct 13, 2015 at 21:03

Burhan Khalid · Accepted Answer · 2015-10-13 21:27:07Z

3

You have a string that is the representation of a list object. The easiest way to get this thing sorted out, is to evaluate the string to get an object out:

>>> import ast
>>> col = "[u'$929.95']"
>>> col2 = ast.literal_eval(col)
>>> type(col)
<type 'str'>
>>> type(col2)
<type 'list'>
>>> col2[0]
u'$929.95'
>>> str(col2[0])
'$929.95'

answered Oct 13, 2015 at 21:27

Burhan Khalid

175k20 gold badges255 silver badges292 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user299709 Over a year ago

the thing is sometimes the value col is clean meaning it's $929.95, I can't control what type of data ultimately makes it through so looks like I need to implement this

cg909 · Accepted Answer · 2015-10-13 21:10:01Z

2

The variable col probably contains a list with one unicode string element.

unicoded_item = to_unicode(col) then creates a unicode string with the representation of that list: u"[u'test']".

You then convert this unicode string to a string using unicoded_item.encode('utf8').

This gives you a (byte) string "[u'test']".

The solution is to access the element(s) in col instead of converting the whole col. If col always contains exactly one element you can simply replace the uses of col with col[0].

answered Oct 13, 2015 at 21:10

cg909

2,58422 silver badges25 bronze badges

3 Comments

user299709 Over a year ago

what I found was that col was already a string. it is not a list. so i was converting a string like [u'$449.97'] into unicode fails. need to convert a string representation of a unicode, into unicode, and then back to string.

cg909 Over a year ago

Ok, then please include more code in your next question. As you didn't provide the assignment resulting in col everyone here could only guess the reason for your result.

user299709 Over a year ago

@c909 I have done that. I think I might just extract the stuff between the texts

Hill · Accepted Answer · 2015-10-13 21:01:03Z

1

It may not deal with the issue directly, but you could use the replace() function to swap the [u' for nothing.

answered Oct 13, 2015 at 21:01

Hill

715 bronze badges

1 Comment

user299709 Over a year ago

actually that's not a bad solution but I think the problem is dealing with the trailing ']

LetzerWille · Accepted Answer · 2015-10-13 22:13:15Z

0

you string is not unicode. It is a regular string. You can get the dollar amount like this:

res = "[u'$929.95']".split("\'",)[1]
print(res)

$929.95

but if it were unicode with u'someletters, to remove u' ran str() on unicode str. .

edited Oct 13, 2015 at 22:13

answered Oct 13, 2015 at 21:20

LetzerWille

5,6965 gold badges26 silver badges28 bronze badges

2 Comments

LetzerWille Over a year ago

@PadraicCunningham I tried and got ë. As for split, yes, it is a bad habit of mine always you re.split. But on a single delimiter re.split is redundant of course, so have changed thanks for you remark ot regular split.

LetzerWille Over a year ago

python3 . OP tag states python without a version

Collectives™ on Stack Overflow

Python: unable to convert unicode to string

4 Answers 4

1 Comment

3 Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

3 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related