Edit after feedback in comments.
Another solution would be to check the numeric value of each character and see if they are under 128, since ascii goes from 0 - 127. Like so:
# coding=utf-8
def removeUnicode():
text = "hejsanäöåbadasd wodqpwdk"
asciiText = ""
for char in text:
if(ord(char) < 128):
asciiText = asciiText + char
return asciiText
import timeit
start = timeit.Timer("removeUnicode()", "from __main__ import removeUnicode")
print "Time taken: " + str(start.timeit())
Here's an altered version of jd's answer with benchmarks:
# coding=utf-8
def removeUnicode():
text = u"hejsanäöåbadasd wodqpwdk"
if(isinstance(text, str)):
return text.decode('utf-8').encode("ascii", "ignore")
else:
return text.encode("ascii", "ignore")
import timeit
start = timeit.Timer("removeUnicode()", "from __main__ import removeUnicode")
print "Time taken: " + str(start.timeit())
Output first solution using a str string as input:
computer:~ Ancide$ python test1.py
Time taken: 5.88719677925
Output first solution using a unicode string as input:
computer:~ Ancide$ python test1.py
Time taken: 7.21077990532
Output second solution using a str string as input:
computer:~ Ancide$ python test1.py
Time taken: 2.67580914497
Output second solution using a unicode string as input:
computer:~ Ancide$ python test1.py
Time taken: 1.740680933
Conclusion
Encoding is the faster solution and encoding the string is less code; Thus the better solution.