3
import re

b="united thats weak. See ya 👋"
print b.decode('utf-8')  #output: u'united thats weak. See ya \U0001f44b'

print re.findall(r'[\U0001f600-\U0001f650]',b.decode('utf-8'),flags=re.U) # output: [u'S']

How to get a output \U0001f44b.

Emojis that I need to handle are

😀_❤️_😁_😂_😃_😄_😅_😆_😇_😈_😉_😊_😋_😌_😍_😎_😏_😐_😑_😒_😓_😔_😕_😖_😗_😘_😙_😚_😛_😜_😝_😞_😟_😠_😡_😢_😣_😤_😥_😦_😧_😨_😩_😪_😫_😬_😭_😮_😯_😰_😱_😲_😳_😴_😵_😶_😷_😸_😹_😺_😻_😼_😽_😾_😿_🙀_🙁_🙂_🙃_🙄_🙅_🙆_🙇_🙈_🙉_🙊_🙋_🙌_🙍_🙎_🙏_🚀_🚁_🚂_🚃_🚄_🚅_🚆_🚇_🚈_🚉_🚊_🚋_🚌_🚍_🚎_🚏_🚐_🚑_🚒_🚓_🚔_🚕_🚖_🚗_🚘_🚙_🚚_🚛_🚜_🚝_🚞_🚟_🚠_🚡_🚢_🚣_🚤_🚥_🚦_🚧_🚨_🚩_🚪_🚫_🚬_🚭_🚮_🚯_🚰_🚱_🚲_🚳_🚴_🚵_🚶_🚷_🚸_🚹_🚺_🚻_🚼_🚽_🚾_🚿_🛀_🛁_🛂_🛃_🛄_🛅_🛋_🛌_🛍_🛎_🛏_🛐_🛠_🛡_🛢_🛣_🛤_🛥_🛩_🛫_🛬_🛰_🛳_🤐_🤑_🤒_🤓_🤔_🤕_🤖_🤗_🤘_🦀_🦁_🦂_🦃_🦄_🧀
4
  • Does it mean you need to match just some emojis? Commented Dec 1, 2016 at 9:59
  • yes...i was trying to do it...but somehow not able to write a accurate pattern Commented Dec 1, 2016 at 14:33
  • can you update your question with all those emojis that you want to match? Thanks. Commented Dec 12, 2016 at 9:47
  • Hi Qubad....i just need a way to use regex in handling the emojis...its not more about a particular emoji. Thanks for the reply :) Commented Dec 12, 2016 at 21:57

1 Answer 1

2

Searching for a unicode range works exactly the same as searching for any sort of character range. But, you'll need to represent the strings correctly. Here is a working example:

#coding: utf-8
import re

b=u"united thats weak. See ya 😇 "
assert re.findall(u'[\U0001f600-\U0001f650]',b) == [u'😇']
assert re.findall(ur'[😀-🙏]',b) == [u'😇']

Notes:

  • You need #coding: utf-8 or similar on the first or second line of your program.
  • In your example, the emoji that you used, U-1f44b is not in the range U-1f600 to U-1f650. In my example, I used one that is.
  • If you want to use \U to include a unicode character, you can't use the raw string prefix (r'').
  • But if you use the characters themselves (instead of \U escapes), then you can use the raw string prefix.
  • You need to ensure that both the pattern and the input string are unicode strings. Neither of them may be UTF8-encoded strings.
  • But you don't need the re.U flag unless your pattern includes \s, \w, or similar.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.