1

I've tried everything but the unicode just doesn't go away.

col = "[u'$929.95']"
unicoded_item = to_unicode(col) # [u'test']

print type(unicoded_item) # <type 'unicode'>
if isinstance(unicoded_item, unicode):
    unicoded_item = unicoded_item.encode('utf8')
    print str(unicoded_item) # [u'test']

I expected the whole [u' and '] to disappear but it just doesn't seem to convert. So when I save this string to a text file, the text file will literally have all the unicode python character [u'test'] is literally written instead of test

8
  • for clarification: do you get [u'test'] or u'test'? Commented Oct 13, 2015 at 20:57
  • That's because you have what looks like the string representation of a list not a string. What do you get when you type print(col[0])? A t or test? Commented Oct 13, 2015 at 20:58
  • @c909 yes getting [u'test'] Commented Oct 13, 2015 at 20:58
  • @BurhanKhalid omg...I see, wondering why the type doesn't say 'list' however if it was a list Commented Oct 13, 2015 at 20:58
  • 1
    That's because you are doing type(unicoded_item); when you did unicoded_item = to_unicode(col), it took the str representation of a list and then converted that to unicode. If you do type(col) you'll get the correct type. Commented Oct 13, 2015 at 21:03

4 Answers 4

3

You have a string that is the representation of a list object. The easiest way to get this thing sorted out, is to evaluate the string to get an object out:

>>> import ast
>>> col = "[u'$929.95']"
>>> col2 = ast.literal_eval(col)
>>> type(col)
<type 'str'>
>>> type(col2)
<type 'list'>
>>> col2[0]
u'$929.95'
>>> str(col2[0])
'$929.95'
Sign up to request clarification or add additional context in comments.

1 Comment

the thing is sometimes the value col is clean meaning it's $929.95, I can't control what type of data ultimately makes it through so looks like I need to implement this
2

The variable col probably contains a list with one unicode string element.

unicoded_item = to_unicode(col) then creates a unicode string with the representation of that list: u"[u'test']".

You then convert this unicode string to a string using unicoded_item.encode('utf8').

This gives you a (byte) string "[u'test']".

The solution is to access the element(s) in col instead of converting the whole col. If col always contains exactly one element you can simply replace the uses of col with col[0].

3 Comments

what I found was that col was already a string. it is not a list. so i was converting a string like [u'$449.97'] into unicode fails. need to convert a string representation of a unicode, into unicode, and then back to string.
Ok, then please include more code in your next question. As you didn't provide the assignment resulting in col everyone here could only guess the reason for your result.
@c909 I have done that. I think I might just extract the stuff between the texts
1

It may not deal with the issue directly, but you could use the replace() function to swap the [u' for nothing.

1 Comment

actually that's not a bad solution but I think the problem is dealing with the trailing ']
0

you string is not unicode. It is a regular string. You can get the dollar amount like this:

res = "[u'$929.95']".split("\'",)[1]
print(res)

$929.95

but if it were unicode with u'someletters, to remove u' ran str() on unicode str. .

2 Comments

@PadraicCunningham I tried and got ë. As for split, yes, it is a bad habit of mine always you re.split. But on a single delimiter re.split is redundant of course, so have changed thanks for you remark ot regular split.
python3 . OP tag states python without a version

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.